thr3ads.net - R help - [R] Split a data.frame [May 2018]

If this information is useful, please help other people find it:
Share via:

Christofer Bogaso

2018-May-19 11:07 UTC

[R] Split a data.frame

Hi,

I am struggling to split a data.frame as will below scheme :

DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF

split_str = c('a', 'c')

Now, for each element in split_str, R should find which row of DF contains
that element, and return DF with all rows starting from next row of the
corresponding element and ending with the preceding value of the next
element.

So in my case, I should see 2 data.frames

1st data-frame with name = 'v' (i.e. 2nd row of DF)

2nd data.frame with number_of_rows as 0 (as there is no row left after
'c')

Similarly if split_str = c('v'') then, my 2 data.frames will be

1st data.frame with name = 'a'
2nd data.frame with name = 'c'

Any idea how to efficiently implement above scheme would be highly
appreciated. I tried with split() function, however, it is not giving the
right answer.

Thanks,

Rui Barradas

2018-May-19 14:58 UTC

head link

[R] Split a data.frame

Hello,

Maybe something like the following.

splitDF <- function(data, col, s){
     n <- nrow(data)
     inx <- which(data[[col]] %in% s)
     lapply(seq_along(inx), function(i){
         k <- if(inx[i] < n) (inx[i] + 1):(inx[i + 1])
         data[k, ]
     })
}

splitDF(DF, "name", split_str)


Hope this helps,

Rui Barradas

On 5/19/2018 12:07 PM, Christofer Bogaso wrote:> Hi,
> 
> I am struggling to split a data.frame as will below scheme :
> 
> DF = data.frame(name = c('a', 'v', 'c'), val = 0);
DF
> 
> split_str = c('a', 'c')
> 
> Now, for each element in split_str, R should find which row of DF contains
> that element, and return DF with all rows starting from next row of the
> corresponding element and ending with the preceding value of the next
> element.
> 
> So in my case, I should see 2 data.frames
> 
> 1st data-frame with name = 'v' (i.e. 2nd row of DF)
> 
> 2nd data.frame with number_of_rows as 0 (as there is no row left after
'c')
> 
> Similarly if split_str = c('v'') then, my 2 data.frames will be
> 
> 1st data.frame with name = 'a'
> 2nd data.frame with name = 'c'
> 
> Any idea how to efficiently implement above scheme would be highly
> appreciated. I tried with split() function, however, it is not giving the
> right answer.
> 
> Thanks,
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

jim holtman

2018-May-19 15:05 UTC

head link

[R] Split a data.frame

DF = data.frame(name = c('a', 'v', 'c'), val = 0); DF
##   name val
## 1    a   0
## 2    v   0
## 3    c   0
split_str = c('a', 'c')
# If we assume that the values in split_str are ordered in the same order
as in the dataframe, then this might work.

offsets <- match(split_str, DF$name)
# Since you only want the rows in between

DF[diff(offsets), ]
##   name val
## 2    v   0


Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, May 19, 2018 at 7:58 AM, Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> Hello,
>
> Maybe something like the following.
>
> splitDF <- function(data, col, s){
>     n <- nrow(data)
>     inx <- which(data[[col]] %in% s)
>     lapply(seq_along(inx), function(i){
>         k <- if(inx[i] < n) (inx[i] + 1):(inx[i + 1])
>         data[k, ]
>     })
> }
>
> splitDF(DF, "name", split_str)
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> On 5/19/2018 12:07 PM, Christofer Bogaso wrote:
>
>> Hi,
>>
>> I am struggling to split a data.frame as will below scheme :
>>
>> DF = data.frame(name = c('a', 'v', 'c'), val =
0); DF
>>
>> split_str = c('a', 'c')
>>
>> Now, for each element in split_str, R should find which row of DF
contains
>> that element, and return DF with all rows starting from next row of the
>> corresponding element and ending with the preceding value of the next
>> element.
>>
>> So in my case, I should see 2 data.frames
>>
>> 1st data-frame with name = 'v' (i.e. 2nd row of DF)
>>
>> 2nd data.frame with number_of_rows as 0 (as there is no row left after
>> 'c')
>>
>> Similarly if split_str = c('v'') then, my 2 data.frames
will be
>>
>> 1st data.frame with name = 'a'
>> 2nd data.frame with name = 'c'
>>
>> Any idea how to efficiently implement above scheme would be highly
>> appreciated. I tried with split() function, however, it is not giving
the
>> right answer.
>>
>> Thanks,
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Bert Gunter

2018-May-19 15:06 UTC

head link

[R] Split a data.frame

...
yes, but note that:

which(data[[col]] %in% s

can be replaced directly by match:

match(data[[col]], s)

Corner cases (nothing matches, etc.) would also have to be checked and
probably should sort the matched row numbers for safety.

Cheers,
Bert

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )

On Sat, May 19, 2018 at 7:58 AM, Rui Barradas <ruipbarradas at sapo.pt>
wrote:
> Hello,
>
> Maybe something like the following.
>
> splitDF <- function(data, col, s){
>     n <- nrow(data)
>     inx <- which(data[[col]] %in% s)
>     lapply(seq_along(inx), function(i){
>         k <- if(inx[i] < n) (inx[i] + 1):(inx[i + 1])
>         data[k, ]
>     })
> }
>
> splitDF(DF, "name", split_str)
>
>
> Hope this helps,
>
> Rui Barradas
>
>
> On 5/19/2018 12:07 PM, Christofer Bogaso wrote:
>
>> Hi,
>>
>> I am struggling to split a data.frame as will below scheme :
>>
>> DF = data.frame(name = c('a', 'v', 'c'), val =
0); DF
>>
>> split_str = c('a', 'c')
>>
>> Now, for each element in split_str, R should find which row of DF
contains
>> that element, and return DF with all rows starting from next row of the
>> corresponding element and ending with the preceding value of the next
>> element.
>>
>> So in my case, I should see 2 data.frames
>>
>> 1st data-frame with name = 'v' (i.e. 2nd row of DF)
>>
>> 2nd data.frame with number_of_rows as 0 (as there is no row left after
>> 'c')
>>
>> Similarly if split_str = c('v'') then, my 2 data.frames
will be
>>
>> 1st data.frame with name = 'a'
>> 2nd data.frame with name = 'c'
>>
>> Any idea how to efficiently implement above scheme would be highly
>> appreciated. I tried with split() function, however, it is not giving
the
>> right answer.
>>
>> Thanks,
>>
>> ______________________________________________
>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posti
>> ng-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posti
> ng-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

jim holtman

2018-May-19 15:22 UTC

head link

[R] Split a data.frame

Forgot to take care of the boundary conditions:

# revised data.frame to take care of boundary conditions
DF = data.frame(name = c('b', 'a','v','z',
'c','d'), val = 0); DF
##   name val
## 1    b   0
## 2    a   0
## 3    v   0
## 4    z   0
## 5    c   0
## 6    d   0
split_str = c('a', 'c')

# If we assume that the values in split_str are ordered in
# the same order as in the dataframe, then this might work.
offsets <- match(split_str, DF$name)

# now find the values inbetween the offsets
ret_indx <- NULL
for (i in seq_len(length(offsets) - 1)){
  if (offsets[i + 1] - offsets[i] > 1){  # something inbetween
    ret_indx <- c(ret_indx, (offsets[i] + 1):(offsets[i+1] - 1))
  }
}
DF[ret_indx, ]
##   name val
## 3    v   0
## 4    z   0



Jim Holtman
Data Munger Guru

What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.

On Sat, May 19, 2018 at 4:07 AM, Christofer Bogaso <
bogaso.christofer at gmail.com> wrote:
> Hi,
>
> I am struggling to split a data.frame as will below scheme :
>
> DF = data.frame(name = c('a', 'v', 'c'), val = 0);
DF
>
> split_str = c('a', 'c')
>
> Now, for each element in split_str, R should find which row of DF contains
> that element, and return DF with all rows starting from next row of the
> corresponding element and ending with the preceding value of the next
> element.
>
> So in my case, I should see 2 data.frames
>
> 1st data-frame with name = 'v' (i.e. 2nd row of DF)
>
> 2nd data.frame with number_of_rows as 0 (as there is no row left after
'c')
>
> Similarly if split_str = c('v'') then, my 2 data.frames will be
>
> 1st data.frame with name = 'a'
> 2nd data.frame with name = 'c'
>
> Any idea how to efficiently implement above scheme would be highly
> appreciated. I tried with split() function, however, it is not giving the
> right answer.
>
> Thanks,
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/
> posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

K. Elo

2018-May-19 15:25 UTC

head link

[R] Split a data.frame

Hi!

How about this:

--- snip --

for (i in 1:(length(split_str)-1)) {
????assign(paste("DF",i,sep=""),DF[
c((which(DF$name==split_str[i])+1):(which(DF$name==split_str[i+1])-1)), 
])
}

--- snip ---

'assign' creates for each subset a new data.frame DFn, where n ist a
count (1,2,...).

But note: if your DF has duplicates in 'name' (e.g. two rows with
'a'
in 'DF$name'), my solution will use the first occurrence only (and this
for both start and for end).

HTH,
Kimmo

2018-05-19 kello 16:37 +0530, Christofer Bogaso wrote:> Hi,
> 
> I am struggling to split a data.frame as will below scheme :
> 
> DF = data.frame(name = c('a', 'v', 'c'), val = 0);
DF
> 
> split_str = c('a', 'c')
> 
> Now, for each element in split_str, R should find which row of DF
> contains
> that element, and return DF with all rows starting from next row of
> the
> corresponding element and ending with the preceding value of the next
> element.
> 
> So in my case, I should see 2 data.frames
> 
> 1st data-frame with name = 'v' (i.e. 2nd row of DF)
> 
> 2nd data.frame with number_of_rows as 0 (as there is no row left
> after 'c')
> 
> Similarly if split_str = c('v'') then, my 2 data.frames will be
> 
> 1st data.frame with name = 'a'
> 2nd data.frame with name = 'c'
> 
> Any idea how to efficiently implement above scheme would be highly
> appreciated. I tried with split() function, however, it is not giving
> the
> right answer.
> 
> Thanks,
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-gui
> de.html
> and provide commented, minimal, self-contained, reproducible code.

Possibly Parallel Threads

Search for more possibly parallel threads

R help - May 2018 - Split a data.frame

[R] Split a data.frame

[R] Split a data.frame

[R] Split a data.frame

[R] Split a data.frame

[R] Split a data.frame

[R] Split a data.frame

Possibly Parallel Threads