I'm trying to use both these packages and wondering whether they are possible... To make this simple, my ultimate goal is determine long stretches of 1s, but I want to do this within groups (hence using the data.table as I use the "set key" option. However, I'm I'm not having much luck making this possible. For example, for simplistic sake, I have the following data: Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C And the following code which I know works hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR") sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1] hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR") summum <- rle(hetmum)$lengths[rle(hetmum)$values==1] hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR") sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1] However, I wish to do the above code by Group (though this file is millions of rows long and groups will be larger but just wanted to simply the example). I did something like this but of course I got an error: LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")] LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group] LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")] LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group] LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")] LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group] The reason being as I want to eventually have something like this: Dad Mum Child Group sumdad summum sumchild AA RR RA A 2 2 0 AA RR RR A 2 2 1 AA AA AA B 4 5 5 AA AA AA B 4 5 5 RA AA RR B 0 5 5 RR AA RR B 4 5 5 AA AA AA B 4 5 5 AA AA RA C 3 3 0 AA AA RA C 3 3 0 AA RR RA C 3 3 0 That is, I would like to have the specific counts next to what I'm consecutively counting per group. So for Group A for dad there are 2 AAs, there are two RRs for mum but only 1 AA or RR for the child and that is RR (so the 1 is next to the RR and not the RA). Can this be done? K.
I do not understand the value of using the rle function in your description, but the code below appears to produce the table you want. Note that better support for the data.table package might be found at stackexchange as the documentation specifies. x <- read.table( text"Dad Mum Child Group AA RR RA A AA RR RR A AA AA AA B AA AA AA B RA AA RR B RR AA RR B AA AA AA B AA AA RA C AA AA RA C AA RR RA C ", header=TRUE, stringsAsFactors=FALSE ) library(data.table) DT <- data.table( x ) DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ] DT[ , sumdad := 0L ] DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ] DT[ , cdad := NULL ] DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ] DT[ , summum := 0L ] DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ] DT[ , cmum := NULL ] DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ] DT[ , sumchild := 0L ] DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ] DT[ , cchild := NULL ]>DTDad Mum Child Group sumdad summum sumchild 1: AA RR RA A 2 2 0 2: AA RR RR A 2 2 1 3: AA AA AA B 4 5 5 4: AA AA AA B 4 5 5 5: RA AA RR B 0 5 5 6: RR AA RR B 4 5 5 7: AA AA AA B 4 5 5 8: AA AA RA C 3 3 0 9: AA AA RA C 3 3 0 10: AA RR RA C 3 3 0 On Tue, 30 Dec 2014, Kate Ignatius wrote:> I'm trying to use both these packages and wondering whether they are possible... > > To make this simple, my ultimate goal is determine long stretches of > 1s, but I want to do this within groups (hence using the data.table as > I use the "set key" option. However, I'm I'm not having much luck > making this possible. > > For example, for simplistic sake, I have the following data: > > Dad Mum Child Group > AA RR RA A > AA RR RR A > AA AA AA B > AA AA AA B > RA AA RR B > RR AA RR B > AA AA AA B > AA AA RA C > AA AA RA C > AA RR RA C > > And the following code which I know works > > hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR") > sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1] > > hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR") > summum <- rle(hetmum)$lengths[rle(hetmum)$values==1] > > hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR") > sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1] > > However, I wish to do the above code by Group (though this file is > millions of rows long and groups will be larger but just wanted to > simply the example). > > I did something like this but of course I got an error: > > LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")] > LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group] > LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")] > LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group] > LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")] > LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group] > > The reason being as I want to eventually have something like this: > > Dad Mum Child Group sumdad summum sumchild > AA RR RA A 2 2 0 > AA RR RR A 2 2 1 > AA AA AA B 4 5 5 > AA AA AA B 4 5 5 > RA AA RR B 0 5 5 > RR AA RR B 4 5 5 > AA AA AA B 4 5 5 > AA AA RA C 3 3 0 > AA AA RA C 3 3 0 > AA RR RA C 3 3 0 > > That is, I would like to have the specific counts next to what I'm > consecutively counting per group. So for Group A for dad there are 2 > AAs, there are two RRs for mum but only 1 AA or RR for the child and > that is RR (so the 1 is next to the RR and not the RA). > > Can this be done? > > K. > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
Is it possible to add the following code or similar in data.table: childseg<-0 x:=sumchild <-0 span<-rle(x)$lengths[rle(x)$values==TRUE childseg[x]<-rep(seq_along(span), times = spanLOH) childseg[childseg == 0]<-'' I was hoping to do this code by SNPEFF_GENE_NAME for mum, dad and child. The problem I'm having is with the span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can be added to data.table. On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> I do not understand the value of using the rle function in your description, > but the code below appears to produce the table you want. > > Note that better support for the data.table package might be found at > stackexchange as the documentation specifies. > > x <- read.table( text> "Dad Mum Child Group > AA RR RA A > AA RR RR A > AA AA AA B > AA AA AA B > RA AA RR B > RR AA RR B > AA AA AA B > AA AA RA C > AA AA RA C > AA RR RA C > ", header=TRUE, stringsAsFactors=FALSE ) > > library(data.table) > DT <- data.table( x ) > DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ] > DT[ , sumdad := 0L ] > DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ] > DT[ , cdad := NULL ] > DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ] > DT[ , summum := 0L ] > DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ] > DT[ , cmum := NULL ] > DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ] > DT[ , sumchild := 0L ] > DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ] > DT[ , cchild := NULL ] > >> DT > > Dad Mum Child Group sumdad summum sumchild > 1: AA RR RA A 2 2 0 > 2: AA RR RR A 2 2 1 > 3: AA AA AA B 4 5 5 > 4: AA AA AA B 4 5 5 > 5: RA AA RR B 0 5 5 > 6: RR AA RR B 4 5 5 > 7: AA AA AA B 4 5 5 > 8: AA AA RA C 3 3 0 > 9: AA AA RA C 3 3 0 > 10: AA RR RA C 3 3 0 > > > On Tue, 30 Dec 2014, Kate Ignatius wrote: > >> I'm trying to use both these packages and wondering whether they are >> possible... >> >> To make this simple, my ultimate goal is determine long stretches of >> 1s, but I want to do this within groups (hence using the data.table as >> I use the "set key" option. However, I'm I'm not having much luck >> making this possible. >> >> For example, for simplistic sake, I have the following data: >> >> Dad Mum Child Group >> AA RR RA A >> AA RR RR A >> AA AA AA B >> AA AA AA B >> RA AA RR B >> RR AA RR B >> AA AA AA B >> AA AA RA C >> AA AA RA C >> AA RR RA C >> >> And the following code which I know works >> >> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR") >> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1] >> >> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR") >> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1] >> >> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR") >> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1] >> >> However, I wish to do the above code by Group (though this file is >> millions of rows long and groups will be larger but just wanted to >> simply the example). >> >> I did something like this but of course I got an error: >> >> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")] >> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group] >> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")] >> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group] >> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")] >> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group] >> >> The reason being as I want to eventually have something like this: >> >> Dad Mum Child Group sumdad summum sumchild >> AA RR RA A 2 2 0 >> AA RR RR A 2 2 1 >> AA AA AA B 4 5 5 >> AA AA AA B 4 5 5 >> RA AA RR B 0 5 5 >> RR AA RR B 4 5 5 >> AA AA AA B 4 5 5 >> AA AA RA C 3 3 0 >> AA AA RA C 3 3 0 >> AA RR RA C 3 3 0 >> >> That is, I would like to have the specific counts next to what I'm >> consecutively counting per group. So for Group A for dad there are 2 >> AAs, there are two RRs for mum but only 1 AA or RR for the child and >> that is RR (so the 1 is next to the RR and not the RA). >> >> Can this be done? >> >> K. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > ---------------------------------------------------------------------------
Is it possible to add the following code or similar in data.table: childseg<-0 x:=sumchild <-0 span<-rle(x)$lengths[rle(x)$values==TRUE childseg[x]<-rep(seq_along(span), times = span) childseg[childseg == 0]<-'' I was hoping to do this code by Group for mum, dad and child. The problem I'm having is with the span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can be added to data.table. [Previous email had incorrect code] On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> I do not understand the value of using the rle function in your description, > but the code below appears to produce the table you want. > > Note that better support for the data.table package might be found at > stackexchange as the documentation specifies. > > x <- read.table( text> "Dad Mum Child Group > AA RR RA A > AA RR RR A > AA AA AA B > AA AA AA B > RA AA RR B > RR AA RR B > AA AA AA B > AA AA RA C > AA AA RA C > AA RR RA C > ", header=TRUE, stringsAsFactors=FALSE ) > > library(data.table) > DT <- data.table( x ) > DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ] > DT[ , sumdad := 0L ] > DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ] > DT[ , cdad := NULL ] > DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ] > DT[ , summum := 0L ] > DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ] > DT[ , cmum := NULL ] > DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ] > DT[ , sumchild := 0L ] > DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ] > DT[ , cchild := NULL ] > >> DT > > Dad Mum Child Group sumdad summum sumchild > 1: AA RR RA A 2 2 0 > 2: AA RR RR A 2 2 1 > 3: AA AA AA B 4 5 5 > 4: AA AA AA B 4 5 5 > 5: RA AA RR B 0 5 5 > 6: RR AA RR B 4 5 5 > 7: AA AA AA B 4 5 5 > 8: AA AA RA C 3 3 0 > 9: AA AA RA C 3 3 0 > 10: AA RR RA C 3 3 0 > > > On Tue, 30 Dec 2014, Kate Ignatius wrote: > >> I'm trying to use both these packages and wondering whether they are >> possible... >> >> To make this simple, my ultimate goal is determine long stretches of >> 1s, but I want to do this within groups (hence using the data.table as >> I use the "set key" option. However, I'm I'm not having much luck >> making this possible. >> >> For example, for simplistic sake, I have the following data: >> >> Dad Mum Child Group >> AA RR RA A >> AA RR RR A >> AA AA AA B >> AA AA AA B >> RA AA RR B >> RR AA RR B >> AA AA AA B >> AA AA RA C >> AA AA RA C >> AA RR RA C >> >> And the following code which I know works >> >> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR") >> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1] >> >> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR") >> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1] >> >> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR") >> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1] >> >> However, I wish to do the above code by Group (though this file is >> millions of rows long and groups will be larger but just wanted to >> simply the example). >> >> I did something like this but of course I got an error: >> >> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")] >> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group] >> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")] >> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group] >> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")] >> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group] >> >> The reason being as I want to eventually have something like this: >> >> Dad Mum Child Group sumdad summum sumchild >> AA RR RA A 2 2 0 >> AA RR RR A 2 2 1 >> AA AA AA B 4 5 5 >> AA AA AA B 4 5 5 >> RA AA RR B 0 5 5 >> RR AA RR B 4 5 5 >> AA AA AA B 4 5 5 >> AA AA RA C 3 3 0 >> AA AA RA C 3 3 0 >> AA RR RA C 3 3 0 >> >> That is, I would like to have the specific counts next to what I'm >> consecutively counting per group. So for Group A for dad there are 2 >> AAs, there are two RRs for mum but only 1 AA or RR for the child and >> that is RR (so the 1 is next to the RR and not the RA). >> >> Can this be done? >> >> K. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > ---------------------------------------------------------------------------