Christopher Desjardins
2011-Apr-06 20:44 UTC
[R] Getting number of students with zeroes in long format
Hi, I have longitudinal school suspension data on students. I would like to figure out how many students (id_r) have no suspensions (sus), i.e. have a code of '0'. My data is in long format and the first 20 records look like the following:> suslm[1:20,c(1,7)]id_r sus 11 0 15 10 16 0 18 0 19 0 19 0 20 0 21 0 21 0 22 0 24 0 24 0 25 3 26 0 26 0 30 0 30 0 31 0 32 0 33 0 Each id_r is unique and I'd like to know the number of id_r that have a 0 for sus not the total number of 0. Does that make sense? Thanks! Chris [[alternative HTML version deleted]]
Jorge Ivan Velez
2011-Apr-06 20:58 UTC
[R] Getting number of students with zeroes in long format
Hi Chris, Is this what you have in mind?> sum(with(yourdata, tapply(sus, id_r, function(x) any(x==0))))[1] 13 HTH, Jorge On Wed, Apr 6, 2011 at 4:44 PM, Christopher Desjardins <> wrote:> Hi, > I have longitudinal school suspension data on students. I would like to > figure out how many students (id_r) have no suspensions (sus), i.e. have a > code of '0'. My data is in long format and the first 20 records look like > the following: > > > suslm[1:20,c(1,7)] > id_r sus > 11 0 > 15 10 > 16 0 > 18 0 > 19 0 > 19 0 > 20 0 > 21 0 > 21 0 > 22 0 > 24 0 > 24 0 > 25 3 > 26 0 > 26 0 > 30 0 > 30 0 > 31 0 > 32 0 > 33 0 > > Each id_r is unique and I'd like to know the number of id_r that have a 0 > for sus not the total number of 0. Does that make sense? > Thanks! > Chris > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Douglas Bates
2011-Apr-06 21:03 UTC
[R] Getting number of students with zeroes in long format
On Wed, Apr 6, 2011 at 3:44 PM, Christopher Desjardins <cddesjardins at gmail.com> wrote:> Hi, > I have longitudinal school suspension data on students. I would like to > figure out how many students (id_r) have no suspensions (sus), i.e. have a > code of '0'. My data is in long format and the first 20 records look like > the following: > >> suslm[1:20,c(1,7)] > ? id_r sus > ? 11 ? 0 > ? 15 ?10 > ? 16 ? 0 > ? 18 ? 0 > ? 19 ? 0 > ? 19 ? 0 > ? 20 ? 0 > ? 21 ? 0 > ? 21 ? 0 > ? 22 ? 0 > ? 24 ? 0 > ? 24 ? 0 > ? 25 ? 3 > ? 26 ? 0 > ? 26 ? 0 > ? 30 ? 0 > ? 30 ? 0 > ? 31 ? 0 > ? 32 ? 0 > ? 33 ? 0 > > Each id_r is unique and I'd like to know the number of id_r that have a 0 > for sus not the total number of 0. Does that make sense?You say you have longitudinal data so may we assum that a particular id_r can occur multiple times in the data set? It is not clear to me what you want the result to be for students who have no suspensions at one time but may have a suspension at another time. Are you interested in the number of students who have only zeros in the sus column? One way to approach this task is to use tapply. I would create a data frame and convert id_r to a factor. df <- within(as.data.frame(suslm), id_r <- factor(id_r)) counts <- with(df, lapply(sus, id_r, function(sus) all(sus == 0))) The tapply function will split the vector sus according to the levels of id_r and apply the function to the subvectors. I just say Jorge's response and he uses the same tactic but he is looking for students who had any value of sus==0
Christopher Desjardins
2011-Apr-07 13:07 UTC
[R] Getting number of students with zeroes in long format
Hi Jorge, I want to make sure this does what I want. So I want to get a count of students that never get a suspension. Once a student has a non-zero I don't want to count that student. Each id_r is may be associated with multiple sus. Are these commands doing this? Because ...> suslm[175953:nrow(suslm),c("id_r","sus")]id_r sus 999881.5 999881 1 999881.6 999881 7 999881.7 999881 0 999881.8 999881 0 999886.5 999886 0 999886.6 999886 0 999886.7 999886 0 999886.8 999886 0 999890.5 999890 0 999890.6 999890 0 999890.7 999890 0 999890.8 999890 0 999892.5 999892 0 999892.6 999892 0 999892.7 999892 0 999892.8 999892 0 999896.5 999896 0 999896.6 999896 4 999896.7 999896 3 999896.8 999896 0 999897.5 999897 0 999897.6 999897 0 999897.7 999897 0> > tail(with(suslm,tapply(sus,id_r,function(x) any(x==0))))999881 999886 999890 999892 999896 999897 TRUE TRUE TRUE TRUE TRUE TRUE> r <- with(suslm, tapply(sus, id_r, function(x) any(x > 0)) > tail(with(suslm, tapply(sus, id_r, function(x) any(x > 0))))999881 999886 999890 999892 999896 999897 TRUE FALSE FALSE FALSE TRUE FALSE Based on this 999881 and 999896 should be FALSE not TRUE I would expect if they were true for the first command they should be false for the second command right?> tail(names(r[ r == TRUE ]))[1] "999752" "999767" "999806" "999807" "999881" "999896"> tail(names(r[ r == FALSE ]))[1] "999869" "999870" "999886" "999890" "999892" "999897" This command seems to do the right thing. Is that right? On Wed, Apr 6, 2011 at 10:25 PM, Jorge Ivan Velez <jorgeivanvelez@gmail.com>wrote:> Hi Chris, > > Sorry I did not see your email before ;-) Here is one option: > > > r <- with(d, tapply(sus, id_r, function(x) any(x > 0))) > > r > 11 15 16 18 19 20 21 22 24 25 26 30 > 31 32 > FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE > FALSE FALSE > 33 > FALSE > > names(r[ r == TRUE ]) > [1] "15" "25" > > Regards, > Jorge > > > On Wed, Apr 6, 2011 at 5:03 PM, Christopher Desjardins <> wrote: > >> Thanks. And how many could I find that have greater than 0? >> Chris >> >> >> On Wed, Apr 6, 2011 at 3:58 PM, Jorge Ivan Velez <> wrote: >> >>> Hi Chris, >>> >>> Is this what you have in mind? >>> >>> > sum(with(yourdata, tapply(sus, id_r, function(x) any(x==0)))) >>> [1] 13 >>> >>> HTH, >>> Jorge >>> >>> >>> On Wed, Apr 6, 2011 at 4:44 PM, Christopher Desjardins <> wrote: >>> >>>> Hi, >>>> I have longitudinal school suspension data on students. I would like to >>>> figure out how many students (id_r) have no suspensions (sus), i.e. have >>>> a >>>> code of '0'. My data is in long format and the first 20 records look >>>> like >>>> the following: >>>> >>>> > suslm[1:20,c(1,7)] >>>> id_r sus >>>> 11 0 >>>> 15 10 >>>> 16 0 >>>> 18 0 >>>> 19 0 >>>> 19 0 >>>> 20 0 >>>> 21 0 >>>> 21 0 >>>> 22 0 >>>> 24 0 >>>> 24 0 >>>> 25 3 >>>> 26 0 >>>> 26 0 >>>> 30 0 >>>> 30 0 >>>> 31 0 >>>> 32 0 >>>> 33 0 >>>> >>>> Each id_r is unique and I'd like to know the number of id_r that have a >>>> 0 >>>> for sus not the total number of 0. Does that make sense? >>>> Thanks! >>>> Chris >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help@r-project.org mailing list >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >> >[[alternative HTML version deleted]]
Seemingly Similar Threads
- df returns weird values
- [LLVMdev] Finding safe thread suspension points while JIT-ing (was: Add pass run listeners to the pass manager.)
- Quick recode of -999 to NA in R
- Changes made to main.c on implementing real time Rsync
- [LLVMdev] Finding safe thread suspension points while JIT-ing (was: Add pass run listeners to the pass manager.)