thr3ads.net - R help - [R] Cumulative split of value in data frame column [Jun 2020]

If this information is useful, please help other people find it:
Share via:

Ravi Jeyaraman

2020-Jun-05 16:32 UTC

[R] Cumulative split of value in data frame column

Assuming, I have a data frame like this ..

df <- data.frame(ID=1:3,
FOO=c('A_B','A_B_C','A_B_C_D_E'))

I want to do a 'cumulative split' of the values in column FOO based on
the
delimiter '_'.  The end result should be like this ..

ID  FOO		FOO_SPLIT1		FOO_SPLIT2 	FOO_SPLIT3
FOO_SPLIT4		FOO_SPLIT5
1   A_B		A		     A_B	
2   A_B_C	    	A			A_B
A_B_C
3   A_B_C_D_E	A		     A_B		    	A_B_C
A_B_C_D		A_B_C_D_E

Any efficient, optimized way to do this?


-- 
This email has been checked for viruses by AVG.
https://www.avg.com

	[[alternative HTML version deleted]]

Bert Gunter

2020-Jun-05 18:28 UTC

head link

[R] Cumulative split of value in data frame column

This is a **plain text list **. In future please post in plain text so that
your post does not get mangled.

Anyway,...

I don't know about "efficient, optimized", but here's one
simple way to do
it using ?strsplit to unsplit and then ?paste to recombine:

df <- data.frame(ID=1:3,
FOO=c('A_B','A_B_C','A_B_C_D_E'))

cumsplit<- function(x,split = "_"){
    w <- x[1]
    for(i in seq_along(x)[-1])  w <- c(w, paste(w[i-1],x[i], sep = split))
    w
}
> lapply(strsplit(df$FOO, split = "_"), cumsplit)[[1]]
[1] "A"   "A_B"

[[2]]
[1] "A"     "A_B"   "A_B_C"

[[3]]
[1] "A"         "A_B"       "A_B_C"    
"A_B_C_D"   "A_B_C_D_E"

I wouldn't be surprised if clever use of regex's would be faster, but as
I
said, this is simple.

Bert Gunter

"The trouble with having an open mind is that people keep coming along and
sticking things into it."
-- Opus (aka Berkeley Breathed in his "Bloom County" comic strip )


On Fri, Jun 5, 2020 at 9:33 AM Ravi Jeyaraman <ravi76 at gmail.com> wrote:
> Assuming, I have a data frame like this ..
>
> df <- data.frame(ID=1:3,
FOO=c('A_B','A_B_C','A_B_C_D_E'))
>
> I want to do a 'cumulative split' of the values in column FOO based
on the
> delimiter '_'.  The end result should be like this ..
>
> ID  FOO         FOO_SPLIT1              FOO_SPLIT2      FOO_SPLIT3
> FOO_SPLIT4              FOO_SPLIT5
> 1   A_B         A                    A_B
> 2   A_B_C               A                       A_B
> A_B_C
> 3   A_B_C_D_E   A                    A_B                        A_B_C
> A_B_C_D         A_B_C_D_E
>
> Any efficient, optimized way to do this?
>
>
> --
> This email has been checked for viruses by AVG.
> https://www.avg.com
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

R help - Jun 2020 - Cumulative split of value in data frame column

[R] Cumulative split of value in data frame column

[R] Cumulative split of value in data frame column