Brock Huntsman
2014-Dec-03 21:14 UTC
[R] combining unequal dataframes based on a common grouping factor
I apologize if this is a relatively easy problem, but have been stuck on
this issue for a few days. I am attempting to combine values from 2
separate dataframes. Each dataframe contains a shared identifier (GROUP).
Dataframe 1 (3272 rows x 3 columns) further divides this shared grouping
factor into unique identifiers (ID), as well as contains the proportion of
the GROUP area of which the unique identifier consists (PROP_AREA).
Dataframe 2 (291 x 14976) in addition to consisting of the shared
identifier, also has numerous columns consisting of values (VALUE1,
VALUE2). I would like to multiply the PROP_AREA in dataframe 1 by each
value in dataframe 2 (VALUE1 through VALUE14976) based on the GROUP factor,
constructing a final dataframe of size 3272 x 14976. An example of the data
frames are as follows:
frame1:
ID
GROUP
PROP_AREA
1
A
0.33
2
A
0.33
3
A
0.33
4
B
0.50
5
B
0.50
6
C
1.00
7
D
1.00
frame2:
GROUP
VALUE1
VALUE2
A
10
5
B
20
10
C
30
15
D
40
20
Desired dataframe
frame3:
ID
VALUE1
VALUE2
1
3.3
1.65
2
3.3
1.65
3
3.3
1.65
4
10
5
5
10
5
6
30
15
7
40
20
I assume I would need to use the %in% function or if statements, but am
unsure how to write the code. I have attempted to construct a for loop with
an if statement, but have not been successful as of yet.
for(i in 1:nrow(frame1)) {
for(j in 2:ncol(frame2)) {
if (frame1$GROUP[i] == frame2$GROUP[i]) {
frame3[i,j+1] <- frame1$PROP_AREA[i]*frame2[i,j+1]
}
}
}
Any advice on suggested code or packages to read up on would be much
appreciated.
Brock
[[alternative HTML version deleted]]
Jeff Newmiller
2014-Dec-03 21:40 UTC
[R] combining unequal dataframes based on a common grouping factor
Posting in HTML format doesn't work nearly as well as you think it does...
Your email is pretty mixed up. Please use plain text format and use dput to make
your data usable in R.
I expect the best answer to your problem is going to be to use the merge
function instead of your for loops.. but the actual data can affect how well any
solution works so giving us dput output is crucial.
---------------------------------------------------------------------------
Jeff Newmiller The ..... ..... Go Live...
DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live
Go...
Live: OO#.. Dead: OO#.. Playing
Research Engineer (Solar/Batteries O.O#. #.O#. with
/Software/Embedded Controllers) .OO#. .OO#. rocks...1k
---------------------------------------------------------------------------
Sent from my phone. Please excuse my brevity.
On December 3, 2014 1:14:16 PM PST, Brock Huntsman <brockhunts at
gmail.com> wrote:>I apologize if this is a relatively easy problem, but have been stuck
>on
>this issue for a few days. I am attempting to combine values from 2
>separate dataframes. Each dataframe contains a shared identifier
>(GROUP).
>Dataframe 1 (3272 rows x 3 columns) further divides this shared
>grouping
>factor into unique identifiers (ID), as well as contains the proportion
>of
>the GROUP area of which the unique identifier consists (PROP_AREA).
>Dataframe 2 (291 x 14976) in addition to consisting of the shared
>identifier, also has numerous columns consisting of values (VALUE1,
>VALUE2). I would like to multiply the PROP_AREA in dataframe 1 by each
>value in dataframe 2 (VALUE1 through VALUE14976) based on the GROUP
>factor,
>constructing a final dataframe of size 3272 x 14976. An example of the
>data
>frames are as follows:
>
>
>frame1:
>
>ID
>
>GROUP
>
>PROP_AREA
>
>1
>
>A
>
>0.33
>
>2
>
>A
>
>0.33
>
>3
>
>A
>
>0.33
>
>4
>
>B
>
>0.50
>
>5
>
>B
>
>0.50
>
>6
>
>C
>
>1.00
>
>7
>
>D
>
>1.00
>
>
>
>frame2:
>
>GROUP
>
>VALUE1
>
>VALUE2
>
>A
>
>10
>
>5
>
>B
>
>20
>
>10
>
>C
>
>30
>
>15
>
>D
>
>40
>
>20
>
>
>
> Desired dataframe
>
>frame3:
>
>ID
>
>VALUE1
>
>VALUE2
>
>1
>
>3.3
>
>1.65
>
>2
>
>3.3
>
>1.65
>
>3
>
>3.3
>
>1.65
>
>4
>
>10
>
>5
>
>5
>
>10
>
>5
>
>6
>
>30
>
>15
>
>7
>
>40
>
>20
>
>
>
>
>
>I assume I would need to use the %in% function or if statements, but am
>unsure how to write the code. I have attempted to construct a for loop
>with
>an if statement, but have not been successful as of yet.
>
>
>for(i in 1:nrow(frame1)) {
>
> for(j in 2:ncol(frame2)) {
>
> if (frame1$GROUP[i] == frame2$GROUP[i]) {
>
> frame3[i,j+1] <- frame1$PROP_AREA[i]*frame2[i,j+1]
>
> }
>
> }
>
>}
>
>
>Any advice on suggested code or packages to read up on would be much
>appreciated.
>
>Brock
>
> [[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.
Chel Hee Lee
2014-Dec-04 02:17 UTC
[R] combining unequal dataframes based on a common grouping factor
> frame1ID GROUP PROP_AREA 1 1 A 0.33 2 2 A 0.33 3 3 A 0.33 4 4 B 0.50 5 5 B 0.50 6 6 C 1.00 7 7 D 1.00 > frame2 GROUP VALUE1 VALUE2 1 A 10 5 2 B 20 10 3 C 30 15 4 D 40 20 > > obj1 <- merge(x=frame1, y=frame2, by="GROUP") > obj1$rval1 <- obj1$PROP_AREA * obj1$VALUE1 > obj1$rval2 <- obj1$PROP_AREA * obj1$VALUE2 > obj1 GROUP ID PROP_AREA VALUE1 VALUE2 rval1 rval2 1 A 1 0.33 10 5 3.3 1.65 2 A 2 0.33 10 5 3.3 1.65 3 A 3 0.33 10 5 3.3 1.65 4 B 4 0.50 20 10 10.0 5.00 5 B 5 0.50 20 10 10.0 5.00 6 C 6 1.00 30 15 30.0 15.00 7 D 7 1.00 40 20 40.0 20.00 > > idx <- match(x=frame1$GROUP, table=frame2$GROUP) > rval1 <- frame1["PROP_AREA"] * frame2[idx, "VALUE1"] > rval2 <- frame1["PROP_AREA"] * frame2[idx, "VALUE2"] > cbind("ID"=frame1[idx, "ID"], rval1, rval2) ID PROP_AREA PROP_AREA 1 1 3.3 1.65 2 1 3.3 1.65 3 1 3.3 1.65 4 2 10.0 5.00 5 2 10.0 5.00 6 3 30.0 15.00 7 4 40.0 20.00 > Is this what you are looking for? I hope this helps. Chel Hee Lee On 12/03/2014 03:14 PM, Brock Huntsman wrote:> I apologize if this is a relatively easy problem, but have been stuck on > this issue for a few days. I am attempting to combine values from 2 > separate dataframes. Each dataframe contains a shared identifier (GROUP). > Dataframe 1 (3272 rows x 3 columns) further divides this shared grouping > factor into unique identifiers (ID), as well as contains the proportion of > the GROUP area of which the unique identifier consists (PROP_AREA). > Dataframe 2 (291 x 14976) in addition to consisting of the shared > identifier, also has numerous columns consisting of values (VALUE1, > VALUE2). I would like to multiply the PROP_AREA in dataframe 1 by each > value in dataframe 2 (VALUE1 through VALUE14976) based on the GROUP factor, > constructing a final dataframe of size 3272 x 14976. An example of the data > frames are as follows: > > > frame1: > > ID > > GROUP > > PROP_AREA > > 1 > > A > > 0.33 > > 2 > > A > > 0.33 > > 3 > > A > > 0.33 > > 4 > > B > > 0.50 > > 5 > > B > > 0.50 > > 6 > > C > > 1.00 > > 7 > > D > > 1.00 > > > > frame2: > > GROUP > > VALUE1 > > VALUE2 > > A > > 10 > > 5 > > B > > 20 > > 10 > > C > > 30 > > 15 > > D > > 40 > > 20 > > > > Desired dataframe > > frame3: > > ID > > VALUE1 > > VALUE2 > > 1 > > 3.3 > > 1.65 > > 2 > > 3.3 > > 1.65 > > 3 > > 3.3 > > 1.65 > > 4 > > 10 > > 5 > > 5 > > 10 > > 5 > > 6 > > 30 > > 15 > > 7 > > 40 > > 20 > > > > > > I assume I would need to use the %in% function or if statements, but am > unsure how to write the code. I have attempted to construct a for loop with > an if statement, but have not been successful as of yet. > > > for(i in 1:nrow(frame1)) { > > for(j in 2:ncol(frame2)) { > > if (frame1$GROUP[i] == frame2$GROUP[i]) { > > frame3[i,j+1] <- frame1$PROP_AREA[i]*frame2[i,j+1] > > } > > } > > } > > > Any advice on suggested code or packages to read up on would be much > appreciated. > > Brock > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >