thr3ads.net - R help - [R] How should I denormalise a data frame list of lists column? [Jul 2009]

If this information is useful, please help other people find it:
Share via:

Tim Slidel

2009-Jul-01 19:21 UTC

[R] How should I denormalise a data frame list of lists column?

Hi,

I have a data frame where one column is a list of lists. I would like to
subset the data frame based on membership of the lists in that column and be
able to 'denormalise' the data frame so that a row is duplicated for
each of
its list elements. Example code follows:

# The data is read in in this form with the c2 list values in single strings
which I then split to give lists:> f1 <- data.frame(c1=0:2, c2=c("A,B,C", "A,E",
"F,G,H"))
> f1$Split <- strsplit(as.character(f1$c2), ",")
> f1  c1    c2   Split
1  0 A,B,C A, B, C
2  1   A,E    A, E
3  2 F,G,H F, G, H

# So f1$Split is the list of lists column I want to denormalise or use as
the subject for subsetting

# f2 is data to use to select subsets from f1> f2 <- data.frame(c1=LETTERS[0:8], c2=c("Apples",
"Badger","Camel","Dog","Elephants","Fish","Goat","Horse"))> f2  c1       c2
1  A    Apple
2  B   Badger
3  C    Camel
4  D      Dog
5  E Elephant
6  F     Fish
7  G     Goat
8  H    Horse

# I was able to find which rows of f2 are represented in the f1 lists (not
entirely sure if this is the best way to do this):> f3 <- f2[f2$c1 %in% unlist(f1$Split),]
> f3  c1       c2
1  A    Apple
2  B   Badger
3  C    Camel
5  E Elephant
6  F     Fish
7  G     Goat
8  H    Horse

# Note that 'D' is missing from f3 because it is not in any of the
f1$Split
lists

# f4 is a subset of f3 and I want to find the rows of f1 where f1$Split
contains any of f4$c1:> f4 <- f3[c(1,3),]
> f4  c1    c2
1  A Apple
3  C Camel

# I tried this and it didn't work, presumably because it's trying to
match
against each list object rather than the list elements, but unlist doesn't
do the trick here because I need the individual rows, I need to unlist on a
row by row basis.> f1[f1$Split %in% f4$c1,][1] c1    c2    Split
<0 rows> (or 0-length row.names)> f1[f4$c1 %in% f1$Split,][1] c1    c2    Split
<0 rows> (or 0-length row.names)> f1[match(f4$c1, f1$Split),]     c1   c2 Split
NA   NA <NA>  NULL
NA.1 NA <NA>  NULL

I also looked at reshape which I don't think helps. I thought I might be
able to create a new data frame with the f1$Split denormalised and use that,
but couldn't find a way to do this, the result I'd want there is
something
like:> f1_denorm  c1    c2   Split       SplitDenorm
1  0 A,B,C A, B, C   A
2  0 A,B,C A, B, C   B
3  0 A,B,C A, B, C   C
4  1   A,E    A, E     A
5  1   A,E    A, E     E
6  2 F,G,H F, G, H   F
7  2 F,G,H F, G, H  G
8  2 F,G,H F, G, H  H

I thought perhaps for loops would be the next thing to try, but there must
be a better way!

Thanks for any help.

Tim

	[[alternative HTML version deleted]]

Tim Slidel

2009-Jul-01 19:31 UTC

head link

[R] How should I denormalise a data frame list of lists column?

Hi,

(apologies for initial html posting)

I have a data frame where one column is a list of lists. I would like to
subset the data frame based on membership of the lists in that column and be
able to 'denormalise' the data frame so that a row is duplicated for
each of
its list elements. Example code follows:

# The data is read in in this form with the c2 list values in single strings
which I then split to give lists:> f1 <- data.frame(c1=0:2, c2=c("A,B,C", "A,E",
"F,G,H"))
> f1$Split <- strsplit(as.character(f1$c2), ",")
> f1  c1    c2   Split
1  0 A,B,C A, B, C
2  1   A,E    A, E
3  2 F,G,H F, G, H

# So f1$Split is the list of lists column I want to denormalise or use as
the subject for subsetting

# f2 is data to use to select subsets from f1> f2 <- data.frame(c1=LETTERS[0:8], c2=c("Apples",
"Badger","Camel","Dog","Elephants","Fish","Goat","Horse"))> f2  c1       c2
1  A    Apple
2  B   Badger
3  C    Camel
4  D      Dog
5  E Elephant
6  F     Fish
7  G     Goat
8  H    Horse

# I was able to find which rows of f2 are represented in the f1 lists (not
entirely sure if this is the best way to do this):> f3 <- f2[f2$c1 %in% unlist(f1$Split),]
> f3  c1       c2
1  A    Apple
2  B   Badger
3  C    Camel
5  E Elephant
6  F     Fish
7  G     Goat
8  H    Horse

# Note that 'D' is missing from f3 because it is not in any of the
f1$Split
lists

# f4 is a subset of f3 and I want to find the rows of f1 where f1$Split
contains any of f4$c1:> f4 <- f3[c(1,3),]
> f4  c1    c2
1  A Apple
3  C Camel

# I tried this and it didn't work, presumably because it's trying to
match
against each list object rather than the list elements, but unlist doesn't
do the trick here because I need the individual rows, I need to unlist on a
row by row basis.> f1[f1$Split %in% f4$c1,][1] c1    c2    Split
<0 rows> (or 0-length row.names)> f1[f4$c1 %in% f1$Split,][1] c1    c2    Split
<0 rows> (or 0-length row.names)> f1[match(f4$c1, f1$Split),]     c1   c2 Split
NA   NA <NA>  NULL
NA.1 NA <NA>  NULL

I also looked at reshape which I don't think helps. I thought I might be
able to create a new data frame with the f1$Split denormalised and use that,
but couldn't find a way to do this, the result I'd want there is
something
like:> f1_denorm  c1    c2   Split       SplitDenorm
1  0 A,B,C A, B, C   A
2  0 A,B,C A, B, C   B
3  0 A,B,C A, B, C   C
4  1   A,E    A, E     A
5  1   A,E    A, E     E
6  2 F,G,H F, G, H   F
7  2 F,G,H F, G, H  G
8  2 F,G,H F, G, H  H

I thought perhaps for loops would be the next thing to try, but there must
be a better way!

Thanks for any help.

Tim

Maybe Matching Threads

Search for more maybe matching threads

R help - Jul 2009 - How should I denormalise a data frame list of lists column?

[R] How should I denormalise a data frame list of lists column?

[R] How should I denormalise a data frame list of lists column?

Maybe Matching Threads