Is it possible to add the following code or similar in data.table: childseg<-0 x:=sumchild <-0 span<-rle(x)$lengths[rle(x)$values==TRUE childseg[x]<-rep(seq_along(span), times = span) childseg[childseg == 0]<-'' I was hoping to do this code by Group for mum, dad and child. The problem I'm having is with the span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can be added to data.table. [Previous email had incorrect code] On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> I do not understand the value of using the rle function in your description, > but the code below appears to produce the table you want. > > Note that better support for the data.table package might be found at > stackexchange as the documentation specifies. > > x <- read.table( text> "Dad Mum Child Group > AA RR RA A > AA RR RR A > AA AA AA B > AA AA AA B > RA AA RR B > RR AA RR B > AA AA AA B > AA AA RA C > AA AA RA C > AA RR RA C > ", header=TRUE, stringsAsFactors=FALSE ) > > library(data.table) > DT <- data.table( x ) > DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ] > DT[ , sumdad := 0L ] > DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ] > DT[ , cdad := NULL ] > DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ] > DT[ , summum := 0L ] > DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ] > DT[ , cmum := NULL ] > DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ] > DT[ , sumchild := 0L ] > DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ] > DT[ , cchild := NULL ] > >> DT > > Dad Mum Child Group sumdad summum sumchild > 1: AA RR RA A 2 2 0 > 2: AA RR RR A 2 2 1 > 3: AA AA AA B 4 5 5 > 4: AA AA AA B 4 5 5 > 5: RA AA RR B 0 5 5 > 6: RR AA RR B 4 5 5 > 7: AA AA AA B 4 5 5 > 8: AA AA RA C 3 3 0 > 9: AA AA RA C 3 3 0 > 10: AA RR RA C 3 3 0 > > > On Tue, 30 Dec 2014, Kate Ignatius wrote: > >> I'm trying to use both these packages and wondering whether they are >> possible... >> >> To make this simple, my ultimate goal is determine long stretches of >> 1s, but I want to do this within groups (hence using the data.table as >> I use the "set key" option. However, I'm I'm not having much luck >> making this possible. >> >> For example, for simplistic sake, I have the following data: >> >> Dad Mum Child Group >> AA RR RA A >> AA RR RR A >> AA AA AA B >> AA AA AA B >> RA AA RR B >> RR AA RR B >> AA AA AA B >> AA AA RA C >> AA AA RA C >> AA RR RA C >> >> And the following code which I know works >> >> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR") >> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1] >> >> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR") >> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1] >> >> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR") >> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1] >> >> However, I wish to do the above code by Group (though this file is >> millions of rows long and groups will be larger but just wanted to >> simply the example). >> >> I did something like this but of course I got an error: >> >> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")] >> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group] >> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")] >> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group] >> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")] >> LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group] >> >> The reason being as I want to eventually have something like this: >> >> Dad Mum Child Group sumdad summum sumchild >> AA RR RA A 2 2 0 >> AA RR RR A 2 2 1 >> AA AA AA B 4 5 5 >> AA AA AA B 4 5 5 >> RA AA RR B 0 5 5 >> RR AA RR B 4 5 5 >> AA AA AA B 4 5 5 >> AA AA RA C 3 3 0 >> AA AA RA C 3 3 0 >> AA RR RA C 3 3 0 >> >> That is, I would like to have the specific counts next to what I'm >> consecutively counting per group. So for Group A for dad there are 2 >> AAs, there are two RRs for mum but only 1 AA or RR for the child and >> that is RR (so the 1 is next to the RR and not the RA). >> >> Can this be done? >> >> K. >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > ---------------------------------------------------------------------------
Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results. Your second and third lines are syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On January 1, 2015 4:16:52 AM PST, Kate Ignatius <kate.ignatius at gmail.com> wrote:>Is it possible to add the following code or similar in data.table: > >childseg<-0 >x:=sumchild <-0 >span<-rle(x)$lengths[rle(x)$values==TRUE >childseg[x]<-rep(seq_along(span), times = span) >childseg[childseg == 0]<-'' > >I was hoping to do this code by Group for mum, dad and >child. The problem I'm having is with the >span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can >be added to data.table. > >[Previous email had incorrect code] > >On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller ><jdnewmil at dcn.davis.ca.us> wrote: >> I do not understand the value of using the rle function in your >description, >> but the code below appears to produce the table you want. >> >> Note that better support for the data.table package might be found at >> stackexchange as the documentation specifies. >> >> x <- read.table( text>> "Dad Mum Child Group >> AA RR RA A >> AA RR RR A >> AA AA AA B >> AA AA AA B >> RA AA RR B >> RR AA RR B >> AA AA AA B >> AA AA RA C >> AA AA RA C >> AA RR RA C >> ", header=TRUE, stringsAsFactors=FALSE ) >> >> library(data.table) >> DT <- data.table( x ) >> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ] >> DT[ , sumdad := 0L ] >> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ] >> DT[ , cdad := NULL ] >> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ] >> DT[ , summum := 0L ] >> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ] >> DT[ , cmum := NULL ] >> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ] >> DT[ , sumchild := 0L ] >> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ] >> DT[ , cchild := NULL ] >> >>> DT >> >> Dad Mum Child Group sumdad summum sumchild >> 1: AA RR RA A 2 2 0 >> 2: AA RR RR A 2 2 1 >> 3: AA AA AA B 4 5 5 >> 4: AA AA AA B 4 5 5 >> 5: RA AA RR B 0 5 5 >> 6: RR AA RR B 4 5 5 >> 7: AA AA AA B 4 5 5 >> 8: AA AA RA C 3 3 0 >> 9: AA AA RA C 3 3 0 >> 10: AA RR RA C 3 3 0 >> >> >> On Tue, 30 Dec 2014, Kate Ignatius wrote: >> >>> I'm trying to use both these packages and wondering whether they are >>> possible... >>> >>> To make this simple, my ultimate goal is determine long stretches of >>> 1s, but I want to do this within groups (hence using the data.table >as >>> I use the "set key" option. However, I'm I'm not having much luck >>> making this possible. >>> >>> For example, for simplistic sake, I have the following data: >>> >>> Dad Mum Child Group >>> AA RR RA A >>> AA RR RR A >>> AA AA AA B >>> AA AA AA B >>> RA AA RR B >>> RR AA RR B >>> AA AA AA B >>> AA AA RA C >>> AA AA RA C >>> AA RR RA C >>> >>> And the following code which I know works >>> >>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR") >>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1] >>> >>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR") >>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1] >>> >>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR") >>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1] >>> >>> However, I wish to do the above code by Group (though this file is >>> millions of rows long and groups will be larger but just wanted to >>> simply the example). >>> >>> I did something like this but of course I got an error: >>> >>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")] >>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group] >>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")] >>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group] >>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")] >>> >LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group] >>> >>> The reason being as I want to eventually have something like this: >>> >>> Dad Mum Child Group sumdad summum sumchild >>> AA RR RA A 2 2 0 >>> AA RR RR A 2 2 1 >>> AA AA AA B 4 5 5 >>> AA AA AA B 4 5 5 >>> RA AA RR B 0 5 5 >>> RR AA RR B 4 5 5 >>> AA AA AA B 4 5 5 >>> AA AA RA C 3 3 0 >>> AA AA RA C 3 3 0 >>> AA RR RA C 3 3 0 >>> >>> That is, I would like to have the specific counts next to what I'm >>> consecutively counting per group. So for Group A for dad there are >2 >>> AAs, there are two RRs for mum but only 1 AA or RR for the child >and >>> that is RR (so the 1 is next to the RR and not the RA). >>> >>> Can this be done? >>> >>> K. >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >--------------------------------------------------------------------------- >> Jeff Newmiller The ..... ..... Go >Live... >> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live >Go... >> Live: OO#.. Dead: OO#.. >Playing >> Research Engineer (Solar/Batteries O.O#. #.O#. with >> /Software/Embedded Controllers) .OO#. .OO#. >rocks...1k >> >---------------------------------------------------------------------------
Apologies - mix up of syntax all over the place, a habit of mine. The last line was in there because of code beforehand so it really doesn't need to be there. Here is the proper code I hope: childseg<-0 x<-sumchild ==0 span<-rle(x)$lengths[rle(x)$values==TRUE] childseg[x]<-rep(seq_along(span), times = span) On Thu, Jan 1, 2015 at 12:13 PM, Jeff Newmiller <jdnewmil at dcn.davis.ca.us> wrote:> Thank you for attempting to encode what you want using R syntax, but you are not really succeeding yet (too many errors). Perhaps another hand generated result would help? A new input data frame might or might not be needed to illustrate desired results. > > Your second and third lines are syntactically incorrect, and I don't understand what you hope to accomplish by assigning an empty string to a numeric in your last line. > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- > Sent from my phone. Please excuse my brevity. > > On January 1, 2015 4:16:52 AM PST, Kate Ignatius <kate.ignatius at gmail.com> wrote: >>Is it possible to add the following code or similar in data.table: >> >>childseg<-0 >>x:=sumchild <-0 >>span<-rle(x)$lengths[rle(x)$values==TRUE >>childseg[x]<-rep(seq_along(span), times = span) >>childseg[childseg == 0]<-'' >> >>I was hoping to do this code by Group for mum, dad and >>child. The problem I'm having is with the >>span<-rle(x)$lengths[rle(x)$values==TRUE line which I'm not sure can >>be added to data.table. >> >>[Previous email had incorrect code] >> >>On Wed, Dec 31, 2014 at 3:45 AM, Jeff Newmiller >><jdnewmil at dcn.davis.ca.us> wrote: >>> I do not understand the value of using the rle function in your >>description, >>> but the code below appears to produce the table you want. >>> >>> Note that better support for the data.table package might be found at >>> stackexchange as the documentation specifies. >>> >>> x <- read.table( text>>> "Dad Mum Child Group >>> AA RR RA A >>> AA RR RR A >>> AA AA AA B >>> AA AA AA B >>> RA AA RR B >>> RR AA RR B >>> AA AA AA B >>> AA AA RA C >>> AA AA RA C >>> AA RR RA C >>> ", header=TRUE, stringsAsFactors=FALSE ) >>> >>> library(data.table) >>> DT <- data.table( x ) >>> DT[ , cdad := as.integer( Dad %in% c( "AA", "RR" ) ) ] >>> DT[ , sumdad := 0L ] >>> DT[ 1==DT$cdad, sumdad := sum( cdad ), by=Group ] >>> DT[ , cdad := NULL ] >>> DT[ , cmum := as.integer( Mum %in% c( "AA", "RR" ) ) ] >>> DT[ , summum := 0L ] >>> DT[ 1==DT$cmum, summum := sum( cmum ), by=Group ] >>> DT[ , cmum := NULL ] >>> DT[ , cchild := as.integer( Child %in% c( "AA", "RR" ) ) ] >>> DT[ , sumchild := 0L ] >>> DT[ 1==DT$cchild, sumchild := sum( cchild ), by=Group ] >>> DT[ , cchild := NULL ] >>> >>>> DT >>> >>> Dad Mum Child Group sumdad summum sumchild >>> 1: AA RR RA A 2 2 0 >>> 2: AA RR RR A 2 2 1 >>> 3: AA AA AA B 4 5 5 >>> 4: AA AA AA B 4 5 5 >>> 5: RA AA RR B 0 5 5 >>> 6: RR AA RR B 4 5 5 >>> 7: AA AA AA B 4 5 5 >>> 8: AA AA RA C 3 3 0 >>> 9: AA AA RA C 3 3 0 >>> 10: AA RR RA C 3 3 0 >>> >>> >>> On Tue, 30 Dec 2014, Kate Ignatius wrote: >>> >>>> I'm trying to use both these packages and wondering whether they are >>>> possible... >>>> >>>> To make this simple, my ultimate goal is determine long stretches of >>>> 1s, but I want to do this within groups (hence using the data.table >>as >>>> I use the "set key" option. However, I'm I'm not having much luck >>>> making this possible. >>>> >>>> For example, for simplistic sake, I have the following data: >>>> >>>> Dad Mum Child Group >>>> AA RR RA A >>>> AA RR RR A >>>> AA AA AA B >>>> AA AA AA B >>>> RA AA RR B >>>> RR AA RR B >>>> AA AA AA B >>>> AA AA RA C >>>> AA AA RA C >>>> AA RR RA C >>>> >>>> And the following code which I know works >>>> >>>> hetdad <- as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR") >>>> sumdad <- rle(hetdad)$lengths[rle(hetdad)$values==1] >>>> >>>> hetmum <- as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR") >>>> summum <- rle(hetmum)$lengths[rle(hetmum)$values==1] >>>> >>>> hetchild <- as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR") >>>> sumchild <- rle(hetchild)$lengths[rle(hetchild)$values==1] >>>> >>>> However, I wish to do the above code by Group (though this file is >>>> millions of rows long and groups will be larger but just wanted to >>>> simply the example). >>>> >>>> I did something like this but of course I got an error: >>>> >>>> LOH[,hetdad:=as.numeric(x[c(1)]=="AA" | x[c(1)]=="RR")] >>>> LOH[,sumdad:=rle(hetdad)$lengths[rle(hetdad)$values==1],by=Group] >>>> LOH[,hetmum:=as.numeric(x[c(2)]=="AA" | x[c(2)]=="RR")] >>>> LOH[,summum:=rle(hetmum)$lengths[rle(hetmum)$values==1],by=Group] >>>> LOH[,hetchild:=as.numeric(x[c(3)]=="AA" | x[c(3)]=="RR")] >>>> >>LOH[,sumchild:=rle(hetchild)$lengths[rle(hetchild)$values==1],by=Group] >>>> >>>> The reason being as I want to eventually have something like this: >>>> >>>> Dad Mum Child Group sumdad summum sumchild >>>> AA RR RA A 2 2 0 >>>> AA RR RR A 2 2 1 >>>> AA AA AA B 4 5 5 >>>> AA AA AA B 4 5 5 >>>> RA AA RR B 0 5 5 >>>> RR AA RR B 4 5 5 >>>> AA AA AA B 4 5 5 >>>> AA AA RA C 3 3 0 >>>> AA AA RA C 3 3 0 >>>> AA RR RA C 3 3 0 >>>> >>>> That is, I would like to have the specific counts next to what I'm >>>> consecutively counting per group. So for Group A for dad there are >>2 >>>> AAs, there are two RRs for mum but only 1 AA or RR for the child >>and >>>> that is RR (so the 1 is next to the RR and not the RA). >>>> >>>> Can this be done? >>>> >>>> K. >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>> >>> >>--------------------------------------------------------------------------- >>> Jeff Newmiller The ..... ..... Go >>Live... >>> DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live >>Go... >>> Live: OO#.. Dead: OO#.. >>Playing >>> Research Engineer (Solar/Batteries O.O#. #.O#. with >>> /Software/Embedded Controllers) .OO#. .OO#. >>rocks...1k >>> >>--------------------------------------------------------------------------- >