thr3ads.net - R help - [R] Add a column to a dataframe based on multiple other column values [Jun 2013]

If this information is useful, please help other people find it:
Share via:

Tom Oates

2013-Jun-11 16:07 UTC

[R] Add a column to a dataframe based on multiple other column values

Hi
I have a dataframe as below:

x1    y1    x2    y2    x3    y3    output
2    100    190    99    1430    79    89
2    100    192    63    1431    75    69
2    100    192    63    1444    51    57
3    0    195    99    1499    50    74.5
3    0    198    98    1500    80    89
30    0    198    100    1451    97    65.66666667
32    100    868    100    1451    97    99
33    82    870    100    1490    97    93
33    0    871    82    1494    85    55.66666667


In reality the dataframe has pairs of columns x & y up to a large number.
As you can see from the column labelled output in the dataframe; I want to
calculate the mean of each row of the yn columns, but only to include each
yn value in the calculation of the mean if the corresponding xn column
value is greater than 10.
So for row 1; you will see that only y2 & y3 are included in calculating
the output column, but for row 6 y1-y3 are all included.
Because the number of paired x & y columns is large I am not sure the best
way to achieve this.
Thanks in advance
Tom

	[[alternative HTML version deleted]]

arun

2013-Jun-11 21:23 UTC

head link

[R] Add a column to a dataframe based on multiple other column values

HI,
May be this helps:
dat1<- read.table(text="
x1??? y1??? x2??? y2??? x3??? y3??? output
2??? 100??? 190??? 99??? 1430??? 79??? 89
2??? 100??? 192??? 63??? 1431??? 75??? 69
2??? 100??? 192??? 63??? 1444??? 51??? 57
3??? 0??? 195??? 99??? 1499??? 50??? 74.5
3??? 0??? 198??? 98??? 1500??? 80??? 89
30??? 0??? 198??? 100??? 1451??? 97??? 65.66666667
32??? 100??? 868??? 100??? 1451??? 97??? 99
33??? 82??? 870??? 100??? 1490??? 97??? 93
33??? 0??? 871??? 82??? 1494??? 85??? 55.66666667
",sep="",header=TRUE)

dat1$output2<-apply(dat1[,-7],1,function(x)
{indx<-((seq(x)-1)%%2+1);indx1<-indx==1;
indx2<-indx==2;mean(x[indx2][x[indx1]>10])})
?dat1
#? x1? y1? x2? y2?? x3 y3?? output? output2
#1? 2 100 190? 99 1430 79 89.00000 89.00000
#2? 2 100 192? 63 1431 75 69.00000 69.00000
#3? 2 100 192? 63 1444 51 57.00000 57.00000
#4? 3?? 0 195? 99 1499 50 74.50000 74.50000
#5? 3?? 0 198? 98 1500 80 89.00000 89.00000
#6 30?? 0 198 100 1451 97 65.66667 65.66667
#7 32 100 868 100 1451 97 99.00000 99.00000
#8 33? 82 870 100 1490 97 93.00000 93.00000
#9 33?? 0 871? 82 1494 85 55.66667 55.66667
A.K.



----- Original Message -----
From: Tom Oates <toates19 at gmail.com>
To: r-help at r-project.org
Cc: 
Sent: Tuesday, June 11, 2013 12:07 PM
Subject: [R] Add a column to a dataframe based on multiple other column
	values

Hi
I have a dataframe as below:

x1? ? y1? ? x2? ? y2? ? x3? ? y3? ? output
2? ? 100? ? 190? ? 99? ? 1430? ? 79? ? 89
2? ? 100? ? 192? ? 63? ? 1431? ? 75? ? 69
2? ? 100? ? 192? ? 63? ? 1444? ? 51? ? 57
3? ? 0? ? 195? ? 99? ? 1499? ? 50? ? 74.5
3? ? 0? ? 198? ? 98? ? 1500? ? 80? ? 89
30? ? 0? ? 198? ? 100? ? 1451? ? 97? ? 65.66666667
32? ? 100? ? 868? ? 100? ? 1451? ? 97? ? 99
33? ? 82? ? 870? ? 100? ? 1490? ? 97? ? 93
33? ? 0? ? 871? ? 82? ? 1494? ? 85? ? 55.66666667


In reality the dataframe has pairs of columns x & y up to a large number.
As you can see from the column labelled output in the dataframe; I want to
calculate the mean of each row of the yn columns, but only to include each
yn value in the calculation of the mean if the corresponding xn column
value is greater than 10.
So for row 1; you will see that only y2 & y3 are included in calculating
the output column, but for row 6 y1-y3 are all included.
Because the number of paired x & y columns is large I am not sure the best
way to achieve this.
Thanks in advance
Tom

??? [[alternative HTML version deleted]]

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Keith S Weintraub

2013-Jun-12 11:12 UTC

head link

[R] Add a column to a dataframe based on multiple other column values

Tom,

Here is my solution. Note that I assume the columns are interleaved as you
describe below. I'm sure others will have better replies.

Note that using dput helps the helpers.

# From dput(mdat)
mdat<-structure(list(x1 = c(2L, 2L, 2L, 3L, 3L, 30L, 32L, 33L, 33L), 
    y1 = c(100L, 100L, 100L, 0L, 0L, 0L, 100L, 82L, 0L), x2 = c(190L, 
    192L, 192L, 195L, 198L, 198L, 868L, 870L, 871L), y2 = c(99L, 
    63L, 63L, 99L, 98L, 100L, 100L, 100L, 82L), x3 = c(1430L, 
    1431L, 1444L, 1499L, 1500L, 1451L, 1451L, 1490L, 1494L), 
    y3 = c(79L, 75L, 51L, 50L, 80L, 97L, 97L, 97L, 85L), output = c(89, 
    69, 57, 74.5, 89, 65.66666667, 99, 93, 55.66666667)), .Names =
c("x1",
"y1", "x2", "y2", "x3", "y3",
"output"), class = "data.frame", row.names = c(NA,
-9L))

mdat.pure<-mdat[,-ncol(mdat)]

# Function to apply to rows
theFunk<-function(x) {
  nxy<-length(x)/2
  idx<-seq_len(nxy)
  xvec<-x[idx*2 - 1]
  yvec<-x[idx*2]
  mean(yvec[xvec>10])
}

# Apply the function to rows
output<-apply(mdat.pure, 1, theFunk)

Or 

mdat.pure$output<-apply(mdat.pure, 1, theFunk)

will put the calculated column at the end of mdat.pure.

Note that I haven't taken account of missing values.

Hope this helps,
KW

--

On Jun 12, 2013, at 6:00 AM, r-help-request at r-project.org wrote:
> Message: 35
> Date: Tue, 11 Jun 2013 17:07:12 +0100
> From: Tom Oates <toates19 at gmail.com>
> To: r-help at r-project.org
> Subject: [R] Add a column to a dataframe based on multiple other
> 	column	values
> Message-ID:
> 	<CAGUdn1CxLfxXNzDWQUo515h_h5qeKFMUyG5MsDb1qn6gBQ7cVg at
mail.gmail.com>
> Content-Type: text/plain
> 
> Hi
> I have a dataframe as below:
> 
> x1    y1    x2    y2    x3    y3    output
> 2    100    190    99    1430    79    89
> 2    100    192    63    1431    75    69
> 2    100    192    63    1444    51    57
> 3    0    195    99    1499    50    74.5
> 3    0    198    98    1500    80    89
> 30    0    198    100    1451    97    65.66666667
> 32    100    868    100    1451    97    99
> 33    82    870    100    1490    97    93
> 33    0    871    82    1494    85    55.66666667
> 
> 
> In reality the dataframe has pairs of columns x & y up to a large
number.
> As you can see from the column labelled output in the dataframe; I want to
> calculate the mean of each row of the yn columns, but only to include each
> yn value in the calculation of the mean if the corresponding xn column
> value is greater than 10.
> So for row 1; you will see that only y2 & y3 are included in
calculating
> the output column, but for row 6 y1-y3 are all included.
> Because the number of paired x & y columns is large I am not sure the
best
> way to achieve this.
> Thanks in advance
> Tom

R help - Jun 2013 - Add a column to a dataframe based on multiple other column values

[R] Add a column to a dataframe based on multiple other column values

[R] Add a column to a dataframe based on multiple other column values

[R] Add a column to a dataframe based on multiple other column values