thr3ads.net - R help - [R] data frame from list of lists with unequal lengths [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Ben Mazzotta

2009-Jul-20 20:46 UTC

[R] data frame from list of lists with unequal lengths

Hello,

I have a dataset with multiple entries in one field separated by "/"
characters. (The true dataset has long names, 20-odd variables, and
hundreds of observations.)


     v1 v2
1     A  L
2   A/B  M
3     C  N
4 D/E/F  O
5     A  P
6     C  L


What I would like is to have a dataset that looks like this instead:
> my.df  v1 v2
1  A  L
2  A  M
3  B  M
4  C  N
5  D  O
6  E  O
7  F  O
8  A  P
9  C  L


My original thought was to break the string into variables using
strsplit(), create new columns in the data frame using cbind(), and then
reshape the dataset with the melt() function.
> v1.new <- as.character(my.df$v1)
> v1.new <- strsplit(v1.new, "/")
> v1.new[[1]]
[1] "A"

[[2]]
[1] "A" "B"

[[3]]
[1] "C"

[[4]]
[1] "D" "E" "F"

[[5]]
[1] "A"

[[6]]
[1] "C"

My next thought was to coerce the list into a data frame, but  I ran
into an error because the list output from strsplit() does not contain
equal length vectors.
> v1.cols <- data.frame(v1.new, check.rows=FALSE)Error in data.frame("A", c("A", "B"),
"C", c("D", "E", "F"), "A",
"C",  :
  arguments imply differing number of rows: 1, 2, 3


How can I create a data frame from the unequal length vectors that
result from strsplit(my.df$v1)?

Am I going about this the wrong way? I have also tried to use
colsplit{reshape} without success.

Thank you for any advice you can offer. I hope the answer to this
question is not too obvious.

jim holtman

2009-Jul-20 22:06 UTC

head link

[R] data frame from list of lists with unequal lengths

try this:
> x     v1 v2
1     A  L
2   A/B  M
3     C  N
4 D/E/F  O
5     A  P
6     C  L> as.data.frame(do.call(rbind, apply(x, 1, function(.row){+     cbind(strsplit(.row[1], '/')[[1]], .row[2])
+ })),row.names='')
  V1 V2
1  A  L
2  A  M
3  B  M
4  C  N
5  D  O
6  E  O
7  F  O
8  A  P
9  C  L>

On Mon, Jul 20, 2009 at 4:46 PM, Ben
Mazzotta<benjamin.mazzotta at tufts.edu> wrote:> Hello,
>
> I have a dataset with multiple entries in one field separated by
"/"
> characters. (The true dataset has long names, 20-odd variables, and
> hundreds of observations.)
>
>
> ? ? v1 v2
> 1 ? ? A ?L
> 2 ? A/B ?M
> 3 ? ? C ?N
> 4 D/E/F ?O
> 5 ? ? A ?P
> 6 ? ? C ?L
>
>
> What I would like is to have a dataset that looks like this instead:
>
>> my.df
> ?v1 v2
> 1 ?A ?L
> 2 ?A ?M
> 3 ?B ?M
> 4 ?C ?N
> 5 ?D ?O
> 6 ?E ?O
> 7 ?F ?O
> 8 ?A ?P
> 9 ?C ?L
>
>
> My original thought was to break the string into variables using
> strsplit(), create new columns in the data frame using cbind(), and then
> reshape the dataset with the melt() function.
>
>> v1.new <- as.character(my.df$v1)
>> v1.new <- strsplit(v1.new, "/")
>> v1.new
> [[1]]
> [1] "A"
>
> [[2]]
> [1] "A" "B"
>
> [[3]]
> [1] "C"
>
> [[4]]
> [1] "D" "E" "F"
>
> [[5]]
> [1] "A"
>
> [[6]]
> [1] "C"
>
> My next thought was to coerce the list into a data frame, but ?I ran
> into an error because the list output from strsplit() does not contain
> equal length vectors.
>
>> v1.cols <- data.frame(v1.new, check.rows=FALSE)
> Error in data.frame("A", c("A", "B"),
"C", c("D", "E", "F"), "A",
"C", ?:
> ?arguments imply differing number of rows: 1, 2, 3
>
>
> How can I create a data frame from the unequal length vectors that
> result from strsplit(my.df$v1)?
>
> Am I going about this the wrong way? I have also tried to use
> colsplit{reshape} without success.
>
> Thank you for any advice you can offer. I hope the answer to this
> question is not too obvious.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Henrique Dallazuanna

2009-Jul-21 01:10 UTC

head link

[R] data frame from list of lists with unequal lengths

Try this:

r <- strsplit(as.character(x$v1), "/")
cbind(unlist(r), rep(x$v2, sapply(r, length)))


On Mon, Jul 20, 2009 at 5:46 PM, Ben Mazzotta
<benjamin.mazzotta@tufts.edu>wrote:
> Hello,
>
> I have a dataset with multiple entries in one field separated by
"/"
> characters. (The true dataset has long names, 20-odd variables, and
> hundreds of observations.)
>
>
>     v1 v2
> 1     A  L
> 2   A/B  M
> 3     C  N
> 4 D/E/F  O
> 5     A  P
> 6     C  L
>
>
> What I would like is to have a dataset that looks like this instead:
>
> > my.df
>  v1 v2
> 1  A  L
> 2  A  M
> 3  B  M
> 4  C  N
> 5  D  O
> 6  E  O
> 7  F  O
> 8  A  P
> 9  C  L
>
>
> My original thought was to break the string into variables using
> strsplit(), create new columns in the data frame using cbind(), and then
> reshape the dataset with the melt() function.
>
> > v1.new <- as.character(my.df$v1)
> > v1.new <- strsplit(v1.new, "/")
> > v1.new
> [[1]]
> [1] "A"
>
> [[2]]
> [1] "A" "B"
>
> [[3]]
> [1] "C"
>
> [[4]]
> [1] "D" "E" "F"
>
> [[5]]
> [1] "A"
>
> [[6]]
> [1] "C"
>
> My next thought was to coerce the list into a data frame, but  I ran
> into an error because the list output from strsplit() does not contain
> equal length vectors.
>
> > v1.cols <- data.frame(v1.new, check.rows=FALSE)
> Error in data.frame("A", c("A", "B"),
"C", c("D", "E", "F"), "A",
"C",  :
>  arguments imply differing number of rows: 1, 2, 3
>
>
> How can I create a data frame from the unequal length vectors that
> result from strsplit(my.df$v1)?
>
> Am I going about this the wrong way? I have also tried to use
> colsplit{reshape} without success.
>
> Thank you for any advice you can offer. I hope the answer to this
> question is not too obvious.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

	[[alternative HTML version deleted]]

Reasonably Related Threads

Search for more apparently analagous threads

R help - Jul 2009 - data frame from list of lists with unequal lengths

[R] data frame from list of lists with unequal lengths

[R] data frame from list of lists with unequal lengths

[R] data frame from list of lists with unequal lengths

Reasonably Related Threads