Brock Huntsman
2014-Dec-03 21:14 UTC
[R] combining unequal dataframes based on a common grouping factor
I apologize if this is a relatively easy problem, but have been stuck on this issue for a few days. I am attempting to combine values from 2 separate dataframes. Each dataframe contains a shared identifier (GROUP). Dataframe 1 (3272 rows x 3 columns) further divides this shared grouping factor into unique identifiers (ID), as well as contains the proportion of the GROUP area of which the unique identifier consists (PROP_AREA). Dataframe 2 (291 x 14976) in addition to consisting of the shared identifier, also has numerous columns consisting of values (VALUE1, VALUE2). I would like to multiply the PROP_AREA in dataframe 1 by each value in dataframe 2 (VALUE1 through VALUE14976) based on the GROUP factor, constructing a final dataframe of size 3272 x 14976. An example of the data frames are as follows: frame1: ID GROUP PROP_AREA 1 A 0.33 2 A 0.33 3 A 0.33 4 B 0.50 5 B 0.50 6 C 1.00 7 D 1.00 frame2: GROUP VALUE1 VALUE2 A 10 5 B 20 10 C 30 15 D 40 20 Desired dataframe frame3: ID VALUE1 VALUE2 1 3.3 1.65 2 3.3 1.65 3 3.3 1.65 4 10 5 5 10 5 6 30 15 7 40 20 I assume I would need to use the %in% function or if statements, but am unsure how to write the code. I have attempted to construct a for loop with an if statement, but have not been successful as of yet. for(i in 1:nrow(frame1)) { for(j in 2:ncol(frame2)) { if (frame1$GROUP[i] == frame2$GROUP[i]) { frame3[i,j+1] <- frame1$PROP_AREA[i]*frame2[i,j+1] } } } Any advice on suggested code or packages to read up on would be much appreciated. Brock [[alternative HTML version deleted]]
Jeff Newmiller
2014-Dec-03 21:40 UTC
[R] combining unequal dataframes based on a common grouping factor
Posting in HTML format doesn't work nearly as well as you think it does... Your email is pretty mixed up. Please use plain text format and use dput to make your data usable in R. I expect the best answer to your problem is going to be to use the merge function instead of your for loops.. but the actual data can affect how well any solution works so giving us dput output is crucial. --------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k --------------------------------------------------------------------------- Sent from my phone. Please excuse my brevity. On December 3, 2014 1:14:16 PM PST, Brock Huntsman <brockhunts at gmail.com> wrote:>I apologize if this is a relatively easy problem, but have been stuck >on >this issue for a few days. I am attempting to combine values from 2 >separate dataframes. Each dataframe contains a shared identifier >(GROUP). >Dataframe 1 (3272 rows x 3 columns) further divides this shared >grouping >factor into unique identifiers (ID), as well as contains the proportion >of >the GROUP area of which the unique identifier consists (PROP_AREA). >Dataframe 2 (291 x 14976) in addition to consisting of the shared >identifier, also has numerous columns consisting of values (VALUE1, >VALUE2). I would like to multiply the PROP_AREA in dataframe 1 by each >value in dataframe 2 (VALUE1 through VALUE14976) based on the GROUP >factor, >constructing a final dataframe of size 3272 x 14976. An example of the >data >frames are as follows: > > >frame1: > >ID > >GROUP > >PROP_AREA > >1 > >A > >0.33 > >2 > >A > >0.33 > >3 > >A > >0.33 > >4 > >B > >0.50 > >5 > >B > >0.50 > >6 > >C > >1.00 > >7 > >D > >1.00 > > > >frame2: > >GROUP > >VALUE1 > >VALUE2 > >A > >10 > >5 > >B > >20 > >10 > >C > >30 > >15 > >D > >40 > >20 > > > > Desired dataframe > >frame3: > >ID > >VALUE1 > >VALUE2 > >1 > >3.3 > >1.65 > >2 > >3.3 > >1.65 > >3 > >3.3 > >1.65 > >4 > >10 > >5 > >5 > >10 > >5 > >6 > >30 > >15 > >7 > >40 > >20 > > > > > >I assume I would need to use the %in% function or if statements, but am >unsure how to write the code. I have attempted to construct a for loop >with >an if statement, but have not been successful as of yet. > > >for(i in 1:nrow(frame1)) { > > for(j in 2:ncol(frame2)) { > > if (frame1$GROUP[i] == frame2$GROUP[i]) { > > frame3[i,j+1] <- frame1$PROP_AREA[i]*frame2[i,j+1] > > } > > } > >} > > >Any advice on suggested code or packages to read up on would be much >appreciated. > >Brock > > [[alternative HTML version deleted]] > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.
Chel Hee Lee
2014-Dec-04 02:17 UTC
[R] combining unequal dataframes based on a common grouping factor
> frame1ID GROUP PROP_AREA 1 1 A 0.33 2 2 A 0.33 3 3 A 0.33 4 4 B 0.50 5 5 B 0.50 6 6 C 1.00 7 7 D 1.00 > frame2 GROUP VALUE1 VALUE2 1 A 10 5 2 B 20 10 3 C 30 15 4 D 40 20 > > obj1 <- merge(x=frame1, y=frame2, by="GROUP") > obj1$rval1 <- obj1$PROP_AREA * obj1$VALUE1 > obj1$rval2 <- obj1$PROP_AREA * obj1$VALUE2 > obj1 GROUP ID PROP_AREA VALUE1 VALUE2 rval1 rval2 1 A 1 0.33 10 5 3.3 1.65 2 A 2 0.33 10 5 3.3 1.65 3 A 3 0.33 10 5 3.3 1.65 4 B 4 0.50 20 10 10.0 5.00 5 B 5 0.50 20 10 10.0 5.00 6 C 6 1.00 30 15 30.0 15.00 7 D 7 1.00 40 20 40.0 20.00 > > idx <- match(x=frame1$GROUP, table=frame2$GROUP) > rval1 <- frame1["PROP_AREA"] * frame2[idx, "VALUE1"] > rval2 <- frame1["PROP_AREA"] * frame2[idx, "VALUE2"] > cbind("ID"=frame1[idx, "ID"], rval1, rval2) ID PROP_AREA PROP_AREA 1 1 3.3 1.65 2 1 3.3 1.65 3 1 3.3 1.65 4 2 10.0 5.00 5 2 10.0 5.00 6 3 30.0 15.00 7 4 40.0 20.00 > Is this what you are looking for? I hope this helps. Chel Hee Lee On 12/03/2014 03:14 PM, Brock Huntsman wrote:> I apologize if this is a relatively easy problem, but have been stuck on > this issue for a few days. I am attempting to combine values from 2 > separate dataframes. Each dataframe contains a shared identifier (GROUP). > Dataframe 1 (3272 rows x 3 columns) further divides this shared grouping > factor into unique identifiers (ID), as well as contains the proportion of > the GROUP area of which the unique identifier consists (PROP_AREA). > Dataframe 2 (291 x 14976) in addition to consisting of the shared > identifier, also has numerous columns consisting of values (VALUE1, > VALUE2). I would like to multiply the PROP_AREA in dataframe 1 by each > value in dataframe 2 (VALUE1 through VALUE14976) based on the GROUP factor, > constructing a final dataframe of size 3272 x 14976. An example of the data > frames are as follows: > > > frame1: > > ID > > GROUP > > PROP_AREA > > 1 > > A > > 0.33 > > 2 > > A > > 0.33 > > 3 > > A > > 0.33 > > 4 > > B > > 0.50 > > 5 > > B > > 0.50 > > 6 > > C > > 1.00 > > 7 > > D > > 1.00 > > > > frame2: > > GROUP > > VALUE1 > > VALUE2 > > A > > 10 > > 5 > > B > > 20 > > 10 > > C > > 30 > > 15 > > D > > 40 > > 20 > > > > Desired dataframe > > frame3: > > ID > > VALUE1 > > VALUE2 > > 1 > > 3.3 > > 1.65 > > 2 > > 3.3 > > 1.65 > > 3 > > 3.3 > > 1.65 > > 4 > > 10 > > 5 > > 5 > > 10 > > 5 > > 6 > > 30 > > 15 > > 7 > > 40 > > 20 > > > > > > I assume I would need to use the %in% function or if statements, but am > unsure how to write the code. I have attempted to construct a for loop with > an if statement, but have not been successful as of yet. > > > for(i in 1:nrow(frame1)) { > > for(j in 2:ncol(frame2)) { > > if (frame1$GROUP[i] == frame2$GROUP[i]) { > > frame3[i,j+1] <- frame1$PROP_AREA[i]*frame2[i,j+1] > > } > > } > > } > > > Any advice on suggested code or packages to read up on would be much > appreciated. > > Brock > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >