Dear R-helpers, I need to count the maximum number of consecutive zero values of a variable in a dataframe by different groups. My dataframe looks like this: ID <- c(1,1,1,2,2,3,3,3,3) x <- c(1,0,0,0,0,1,1,0,1) df <- data.frame(ID=ID,x=x) rm(ID,x) So I want to get the max number of consecutive zeros of variable x for each ID. I found rle() to be helpful for this task; so I did: FUN <- function(x) { rles <- rle(x == 0) } consec <- lapply(split(df[,2],df[,1]), FUN) consec is now an rle object containing lists für each ID that contain $lenghts: int as the counts for every consecutive number and $values: logi indicating if the consecutive numbers are zero or not. Unfortunately I'm not very experienced with lists. Could you help me how to extract the max number of consec zeros for each ID and return the result as a dataframe containing ID and max number of consecutive zeros? Different approaches are also welcome. Since the real dataframe is quite large, a fast solution is appreciated. Best regards, Carlos -- ----------------------------------------------------------------- Carlos Nasher Buchenstr. 12 22299 Hamburg tel: +49 (0)40 67952962 mobil: +49 (0)175 9386725 mail: carlos.nasher@gmail.com [[alternative HTML version deleted]]
> -----Original Message----- > So I want to get the max number of consecutive zeros of variable x for each > ID. I found rle() to be helpful for this task; so I did: > > FUN <- function(x) { > rles <- rle(x == 0) > } > consec <- lapply(split(df[,2],df[,1]), FUN)You're probably better off with tapply and a function that returns what you want. You're probably also better off with a data frame name that isn't a function name, so I'll use dfr instead of df... dfr<- data.frame(x=rpois(500, 1.5), ID=gl(5,100)) #5 ID groups numbered 1-5, equal size but that doesn't matter for tapply f2 <- function(x) { max( rle(x == 0)$lengths ) } with(dfr, tapply(x, ID, f2)) S Ellison ******************************************************************* This email and any attachments are confidential. Any use...{{dropped:8}}
Hi, May be this helps: fun1 <- function(dat){ ?lst1 <- lapply(split(dat,dat$ID),function(y){ ?rl <- rle(y$x) ?data.frame(ID=unique(y$ID),MAXZero=max(rl$lengths[rl$values==0])) ?}) ?do.call(rbind,lst1) ?} fun1(df) #? ID MAXZero #1? 1?????? 2 #2? 2?????? 2 #3? 3?????? 1 A.K. On Thursday, October 31, 2013 7:22 AM, Carlos Nasher <carlos.nasher at googlemail.com> wrote: Dear R-helpers, I need to count the maximum number of consecutive zero values of a variable in a dataframe by different groups. My dataframe looks like this: ID <- c(1,1,1,2,2,3,3,3,3) x <- c(1,0,0,0,0,1,1,0,1) df <- data.frame(ID=ID,x=x) rm(ID,x) So I want to get the max number of consecutive zeros of variable x for each ID. I found rle() to be helpful for this task; so I did: FUN <- function(x) { ? rles <- rle(x == 0) } consec <- lapply(split(df[,2],df[,1]), FUN) consec is now an rle object containing lists f?r each ID that contain $lenghts: int as the counts for every consecutive number and $values: logi indicating if the consecutive numbers are zero or not. Unfortunately I'm not very experienced with lists. Could you help me how to extract the max number of consec zeros for each ID and return the result as a dataframe containing ID and max number of consecutive zeros? Different approaches are also welcome. Since the real dataframe is quite large, a fast solution is appreciated. Best regards, Carlos -- ----------------------------------------------------------------- Carlos Nasher Buchenstr. 12 22299 Hamburg tel:? ? ? ? ? ? +49 (0)40 67952962 mobil:? ? ? ? +49 (0)175 9386725 mail:? ? ? ? ? carlos.nasher at gmail.com ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hi Carlos, With Bioconductor, this can simply be done with: library(IRanges) ID <- Rle(1:3, c(3,2,4)) x <- Rle(c(1,0,0,0,0,1,1,0,1)) groups <- split(x, ID) idx <- groups == 0 Then: > max(runLength(idx)[runValue(idx)]) 1 2 3 2 2 1 Should be fast even with hundreds of thousands of groups (should take < 10 sec). HTH, H. On 10/31/2013 04:20 AM, Carlos Nasher wrote:> Dear R-helpers, > > I need to count the maximum number of consecutive zero values of a variable > in a dataframe by different groups. My dataframe looks like this: > > ID <- c(1,1,1,2,2,3,3,3,3) > x <- c(1,0,0,0,0,1,1,0,1) > df <- data.frame(ID=ID,x=x) > rm(ID,x) > > So I want to get the max number of consecutive zeros of variable x for each > ID. I found rle() to be helpful for this task; so I did: > > FUN <- function(x) { > rles <- rle(x == 0) > } > consec <- lapply(split(df[,2],df[,1]), FUN) > > consec is now an rle object containing lists f?r each ID that contain > $lenghts: int as the counts for every consecutive number and $values: logi > indicating if the consecutive numbers are zero or not. > > Unfortunately I'm not very experienced with lists. Could you help me how to > extract the max number of consec zeros for each ID and return the result as > a dataframe containing ID and max number of consecutive zeros? > > Different approaches are also welcome. Since the real dataframe is quite > large, a fast solution is appreciated. > > Best regards, > Carlos > > > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Herv? Pag?s Program in Computational Biology Division of Public Health Sciences Fred Hutchinson Cancer Research Center 1100 Fairview Ave. N, M1-B514 P.O. Box 19024 Seattle, WA 98109-1024 E-mail: hpages at fhcrc.org Phone: (206) 667-5791 Fax: (206) 667-1319