Tim Slidel
2009-Jul-01 19:21 UTC
[R] How should I denormalise a data frame list of lists column?
Hi, I have a data frame where one column is a list of lists. I would like to subset the data frame based on membership of the lists in that column and be able to 'denormalise' the data frame so that a row is duplicated for each of its list elements. Example code follows: # The data is read in in this form with the c2 list values in single strings which I then split to give lists:> f1 <- data.frame(c1=0:2, c2=c("A,B,C", "A,E", "F,G,H")) > f1$Split <- strsplit(as.character(f1$c2), ",") > f1c1 c2 Split 1 0 A,B,C A, B, C 2 1 A,E A, E 3 2 F,G,H F, G, H # So f1$Split is the list of lists column I want to denormalise or use as the subject for subsetting # f2 is data to use to select subsets from f1> f2 <- data.frame(c1=LETTERS[0:8], c2=c("Apples","Badger","Camel","Dog","Elephants","Fish","Goat","Horse"))> f2c1 c2 1 A Apple 2 B Badger 3 C Camel 4 D Dog 5 E Elephant 6 F Fish 7 G Goat 8 H Horse # I was able to find which rows of f2 are represented in the f1 lists (not entirely sure if this is the best way to do this):> f3 <- f2[f2$c1 %in% unlist(f1$Split),] > f3c1 c2 1 A Apple 2 B Badger 3 C Camel 5 E Elephant 6 F Fish 7 G Goat 8 H Horse # Note that 'D' is missing from f3 because it is not in any of the f1$Split lists # f4 is a subset of f3 and I want to find the rows of f1 where f1$Split contains any of f4$c1:> f4 <- f3[c(1,3),] > f4c1 c2 1 A Apple 3 C Camel # I tried this and it didn't work, presumably because it's trying to match against each list object rather than the list elements, but unlist doesn't do the trick here because I need the individual rows, I need to unlist on a row by row basis.> f1[f1$Split %in% f4$c1,][1] c1 c2 Split <0 rows> (or 0-length row.names)> f1[f4$c1 %in% f1$Split,][1] c1 c2 Split <0 rows> (or 0-length row.names)> f1[match(f4$c1, f1$Split),]c1 c2 Split NA NA <NA> NULL NA.1 NA <NA> NULL I also looked at reshape which I don't think helps. I thought I might be able to create a new data frame with the f1$Split denormalised and use that, but couldn't find a way to do this, the result I'd want there is something like:> f1_denormc1 c2 Split SplitDenorm 1 0 A,B,C A, B, C A 2 0 A,B,C A, B, C B 3 0 A,B,C A, B, C C 4 1 A,E A, E A 5 1 A,E A, E E 6 2 F,G,H F, G, H F 7 2 F,G,H F, G, H G 8 2 F,G,H F, G, H H I thought perhaps for loops would be the next thing to try, but there must be a better way! Thanks for any help. Tim [[alternative HTML version deleted]]
Tim Slidel
2009-Jul-01 19:31 UTC
[R] How should I denormalise a data frame list of lists column?
Hi, (apologies for initial html posting) I have a data frame where one column is a list of lists. I would like to subset the data frame based on membership of the lists in that column and be able to 'denormalise' the data frame so that a row is duplicated for each of its list elements. Example code follows: # The data is read in in this form with the c2 list values in single strings which I then split to give lists:> f1 <- data.frame(c1=0:2, c2=c("A,B,C", "A,E", "F,G,H")) > f1$Split <- strsplit(as.character(f1$c2), ",") > f1c1 c2 Split 1 0 A,B,C A, B, C 2 1 A,E A, E 3 2 F,G,H F, G, H # So f1$Split is the list of lists column I want to denormalise or use as the subject for subsetting # f2 is data to use to select subsets from f1> f2 <- data.frame(c1=LETTERS[0:8], c2=c("Apples","Badger","Camel","Dog","Elephants","Fish","Goat","Horse"))> f2c1 c2 1 A Apple 2 B Badger 3 C Camel 4 D Dog 5 E Elephant 6 F Fish 7 G Goat 8 H Horse # I was able to find which rows of f2 are represented in the f1 lists (not entirely sure if this is the best way to do this):> f3 <- f2[f2$c1 %in% unlist(f1$Split),] > f3c1 c2 1 A Apple 2 B Badger 3 C Camel 5 E Elephant 6 F Fish 7 G Goat 8 H Horse # Note that 'D' is missing from f3 because it is not in any of the f1$Split lists # f4 is a subset of f3 and I want to find the rows of f1 where f1$Split contains any of f4$c1:> f4 <- f3[c(1,3),] > f4c1 c2 1 A Apple 3 C Camel # I tried this and it didn't work, presumably because it's trying to match against each list object rather than the list elements, but unlist doesn't do the trick here because I need the individual rows, I need to unlist on a row by row basis.> f1[f1$Split %in% f4$c1,][1] c1 c2 Split <0 rows> (or 0-length row.names)> f1[f4$c1 %in% f1$Split,][1] c1 c2 Split <0 rows> (or 0-length row.names)> f1[match(f4$c1, f1$Split),]c1 c2 Split NA NA <NA> NULL NA.1 NA <NA> NULL I also looked at reshape which I don't think helps. I thought I might be able to create a new data frame with the f1$Split denormalised and use that, but couldn't find a way to do this, the result I'd want there is something like:> f1_denormc1 c2 Split SplitDenorm 1 0 A,B,C A, B, C A 2 0 A,B,C A, B, C B 3 0 A,B,C A, B, C C 4 1 A,E A, E A 5 1 A,E A, E E 6 2 F,G,H F, G, H F 7 2 F,G,H F, G, H G 8 2 F,G,H F, G, H H I thought perhaps for loops would be the next thing to try, but there must be a better way! Thanks for any help. Tim