Hi, I am struggling to split a data.frame as will below scheme : DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF split_str = c('a', 'c') Now, for each element in split_str, R should find which row of DF contains that element, and return DF with all rows starting from next row of the corresponding element and ending with the preceding value of the next element. So in my case, I should see 2 data.frames 1st data-frame with name = 'v' (i.e. 2nd row of DF) 2nd data.frame with number_of_rows as 0 (as there is no row left after 'c') Similarly if split_str = c('v'') then, my 2 data.frames will be 1st data.frame with name = 'a' 2nd data.frame with name = 'c' Any idea how to efficiently implement above scheme would be highly appreciated. I tried with split() function, however, it is not giving the right answer. Thanks,
Hello, Maybe something like the following. splitDF <- function(data, col, s){ n <- nrow(data) inx <- which(data[[col]] %in% s) lapply(seq_along(inx), function(i){ k <- if(inx[i] < n) (inx[i] + 1):(inx[i + 1]) data[k, ] }) } splitDF(DF, "name", split_str) Hope this helps, Rui Barradas On 5/19/2018 12:07 PM, Christofer Bogaso wrote:> Hi, > > I am struggling to split a data.frame as will below scheme : > > DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF > > split_str = c('a', 'c') > > Now, for each element in split_str, R should find which row of DF contains > that element, and return DF with all rows starting from next row of the > corresponding element and ending with the preceding value of the next > element. > > So in my case, I should see 2 data.frames > > 1st data-frame with name = 'v' (i.e. 2nd row of DF) > > 2nd data.frame with number_of_rows as 0 (as there is no row left after 'c') > > Similarly if split_str = c('v'') then, my 2 data.frames will be > > 1st data.frame with name = 'a' > 2nd data.frame with name = 'c' > > Any idea how to efficiently implement above scheme would be highly > appreciated. I tried with split() function, however, it is not giving the > right answer. > > Thanks, > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF ## name val ## 1 a 0 ## 2 v 0 ## 3 c 0 split_str = c('a', 'c') # If we assume that the values in split_str are ordered in the same order as in the dataframe, then this might work. offsets <- match(split_str, DF$name) # Since you only want the rows in between DF[diff(offsets), ] ## name val ## 2 v 0 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sat, May 19, 2018 at 7:58 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > Maybe something like the following. > > splitDF <- function(data, col, s){ > n <- nrow(data) > inx <- which(data[[col]] %in% s) > lapply(seq_along(inx), function(i){ > k <- if(inx[i] < n) (inx[i] + 1):(inx[i + 1]) > data[k, ] > }) > } > > splitDF(DF, "name", split_str) > > > Hope this helps, > > Rui Barradas > > > On 5/19/2018 12:07 PM, Christofer Bogaso wrote: > >> Hi, >> >> I am struggling to split a data.frame as will below scheme : >> >> DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF >> >> split_str = c('a', 'c') >> >> Now, for each element in split_str, R should find which row of DF contains >> that element, and return DF with all rows starting from next row of the >> corresponding element and ending with the preceding value of the next >> element. >> >> So in my case, I should see 2 data.frames >> >> 1st data-frame with name = 'v' (i.e. 2nd row of DF) >> >> 2nd data.frame with number_of_rows as 0 (as there is no row left after >> 'c') >> >> Similarly if split_str = c('v'') then, my 2 data.frames will be >> >> 1st data.frame with name = 'a' >> 2nd data.frame with name = 'c' >> >> Any idea how to efficiently implement above scheme would be highly >> appreciated. I tried with split() function, however, it is not giving the >> right answer. >> >> Thanks, >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posti > ng-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
... yes, but note that: which(data[[col]] %in% s can be replaced directly by match: match(data[[col]], s) Corner cases (nothing matches, etc.) would also have to be checked and probably should sort the matched row numbers for safety. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sat, May 19, 2018 at 7:58 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote:> Hello, > > Maybe something like the following. > > splitDF <- function(data, col, s){ > n <- nrow(data) > inx <- which(data[[col]] %in% s) > lapply(seq_along(inx), function(i){ > k <- if(inx[i] < n) (inx[i] + 1):(inx[i + 1]) > data[k, ] > }) > } > > splitDF(DF, "name", split_str) > > > Hope this helps, > > Rui Barradas > > > On 5/19/2018 12:07 PM, Christofer Bogaso wrote: > >> Hi, >> >> I am struggling to split a data.frame as will below scheme : >> >> DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF >> >> split_str = c('a', 'c') >> >> Now, for each element in split_str, R should find which row of DF contains >> that element, and return DF with all rows starting from next row of the >> corresponding element and ending with the preceding value of the next >> element. >> >> So in my case, I should see 2 data.frames >> >> 1st data-frame with name = 'v' (i.e. 2nd row of DF) >> >> 2nd data.frame with number_of_rows as 0 (as there is no row left after >> 'c') >> >> Similarly if split_str = c('v'') then, my 2 data.frames will be >> >> 1st data.frame with name = 'a' >> 2nd data.frame with name = 'c' >> >> Any idea how to efficiently implement above scheme would be highly >> appreciated. I tried with split() function, however, it is not giving the >> right answer. >> >> Thanks, >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posti > ng-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Forgot to take care of the boundary conditions: # revised data.frame to take care of boundary conditions DF = data.frame(name = c('b', 'a','v','z', 'c','d'), val = 0); DF ## name val ## 1 b 0 ## 2 a 0 ## 3 v 0 ## 4 z 0 ## 5 c 0 ## 6 d 0 split_str = c('a', 'c') # If we assume that the values in split_str are ordered in # the same order as in the dataframe, then this might work. offsets <- match(split_str, DF$name) # now find the values inbetween the offsets ret_indx <- NULL for (i in seq_len(length(offsets) - 1)){ if (offsets[i + 1] - offsets[i] > 1){ # something inbetween ret_indx <- c(ret_indx, (offsets[i] + 1):(offsets[i+1] - 1)) } } DF[ret_indx, ] ## name val ## 3 v 0 ## 4 z 0 Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Sat, May 19, 2018 at 4:07 AM, Christofer Bogaso < bogaso.christofer at gmail.com> wrote:> Hi, > > I am struggling to split a data.frame as will below scheme : > > DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF > > split_str = c('a', 'c') > > Now, for each element in split_str, R should find which row of DF contains > that element, and return DF with all rows starting from next row of the > corresponding element and ending with the preceding value of the next > element. > > So in my case, I should see 2 data.frames > > 1st data-frame with name = 'v' (i.e. 2nd row of DF) > > 2nd data.frame with number_of_rows as 0 (as there is no row left after 'c') > > Similarly if split_str = c('v'') then, my 2 data.frames will be > > 1st data.frame with name = 'a' > 2nd data.frame with name = 'c' > > Any idea how to efficiently implement above scheme would be highly > appreciated. I tried with split() function, however, it is not giving the > right answer. > > Thanks, > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi! How about this: --- snip -- for (i in 1:(length(split_str)-1)) { ????assign(paste("DF",i,sep=""),DF[ c((which(DF$name==split_str[i])+1):(which(DF$name==split_str[i+1])-1)), ]) } --- snip --- 'assign' creates for each subset a new data.frame DFn, where n ist a count (1,2,...). But note: if your DF has duplicates in 'name' (e.g. two rows with 'a' in 'DF$name'), my solution will use the first occurrence only (and this for both start and for end). HTH, Kimmo 2018-05-19 kello 16:37 +0530, Christofer Bogaso wrote:> Hi, > > I am struggling to split a data.frame as will below scheme : > > DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF > > split_str = c('a', 'c') > > Now, for each element in split_str, R should find which row of DF > contains > that element, and return DF with all rows starting from next row of > the > corresponding element and ending with the preceding value of the next > element. > > So in my case, I should see 2 data.frames > > 1st data-frame with name = 'v' (i.e. 2nd row of DF) > > 2nd data.frame with number_of_rows as 0 (as there is no row left > after 'c') > > Similarly if split_str = c('v'') then, my 2 data.frames will be > > 1st data.frame with name = 'a' > 2nd data.frame with name = 'c' > > Any idea how to efficiently implement above scheme would be highly > appreciated. I tried with split() function, however, it is not giving > the > right answer. > > Thanks, > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-gui > de.html > and provide commented, minimal, self-contained, reproducible code.