Al Roark
2011-Mar-05 00:37 UTC
[R] Repeating the same calculation across multiple pairs of variables
Hi all,
I frequently encounter datasets that require me to repeat the same calculation
across many variables. For example, given a dataset with total employment
variables and manufacturing employment variables for the years 1990-2010, I
might have to calculate manufacturing's share of total employment in each
year. I find it cumbersome to have to manually define a share for each year and
would like to know how others might handle this kind of task.
For example, given the data frame:
df<-data.frame(a1=1:10, a2=11:20, a3=21:30, b1=101:110, b2=111:120,
b3=121:130)
I'd like to append new variables--c1, c2, and c3--to the data frame that are
the result of a1/b1, a2/b2, and a3/b3, respectively.
When there are only a few of these variables, I don't really have a problem,
but it becomes a chore when the number of variables increases. Is there a way I
can do this kind of processing using a loop? I tried defining a vector to hold
the names for the "c variables" (e.g. c1,c2, ... cn) and creating new
variables in a loop using code like:
avars<-c("a1","a2","a3")
bvars<-c("b1","b2","b3")
cvars<-c("c1","c2","c3")
for(i in 1:3){
df$cvars[i]<-df$avars[i]/df$bvars[i]
}
But the variable references don't resolve properly with this particular
syntax.
Any help would be much appreciated. Cheers.
[[alternative HTML version deleted]]
Joshua Wiley
2011-Mar-05 03:00 UTC
[R] Repeating the same calculation across multiple pairs of variables
Hi Al,
Assuming that the order of the matrices resulting from selecting
"avars" and "bvars" is identical (it is at least in the
example you
gave), then you can do:
dat <- data.frame(a1=1:10, a2=11:20, a3=21:30, b1=101:110, b2=111:120,
b3=121:130)
avars <- paste("a", 1:3, sep = '')
bvars <- paste("b", 1:3, sep = '')
cvars <- paste("c", 1:3, sep = '')
dat[, cvars] <- dat[, avars] / dat[, bvars]
If you are using character strings for the names, you need to use [
rather than $. For documentation, see ?"["
Hope this helps,
Josh
On Fri, Mar 4, 2011 at 4:37 PM, Al Roark <hrbuilder at hotmail.com>
wrote:>
> Hi all,
>
> I frequently encounter datasets that require me to repeat the same
calculation across many variables. For example, given a dataset with total
employment variables and manufacturing employment variables for the years
1990-2010, I might have to calculate manufacturing's share of total
employment in each year. I find it cumbersome to have to manually define a share
for each year and would like to know how others might handle this kind of task.
>
> For example, given the data frame:
>
> df<-data.frame(a1=1:10, a2=11:20, a3=21:30, b1=101:110, b2=111:120,
b3=121:130)
>
> I'd like to append new variables--c1, c2, and c3--to the data frame
that are the result of a1/b1, a2/b2, and a3/b3, respectively.
>
> When there are only a few of these variables, I don't really have a
problem, but it becomes a chore when the number of variables increases. Is there
a way I can do this kind of processing using a loop? I tried defining a vector
to hold the names for the "c variables" (e.g. c1,c2, ... cn) and
creating new variables in a loop using code like:
>
> avars<-c("a1","a2","a3")
> bvars<-c("b1","b2","b3")
> cvars<-c("c1","c2","c3")
> for(i in 1:3){
> ?df$cvars[i]<-df$avars[i]/df$bvars[i]
> }
>
> But the variable references don't resolve properly with this particular
syntax.
>
> Any help would be much appreciated. Cheers.
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Joshua Wiley
Ph.D. Student, Health Psychology
University of California, Los Angeles
http://www.joshuawiley.com/
Dennis Murphy
2011-Mar-05 05:33 UTC
[R] Repeating the same calculation across multiple pairs of variables
Hi:
Perhaps you had something like this in mind:
df<-data.frame(a1=1:10, a2=11:20, a3=21:30, b1=101:110, b2=111:120,
b3=121:130)
avars <- names(df)[grep('^a', names(df))]
bvars <- names(df)[grep('^b', names(df))]
cvars <- paste('c', 1:length(avars), sep = '')
df <- within(df, for(i in seq_along(cvars)) assign(cvars[i],
get(avars[i])/get(bvars[i])))> df
a1 a2 a3 b1 b2 b3 c3 c2 c1 i
1 1 11 21 101 111 121 0.1735537 0.0990991 0.00990099 3
2 2 12 22 102 112 122 0.1803279 0.1071429 0.01960784 3
3 3 13 23 103 113 123 0.1869919 0.1150442 0.02912621 3
4 4 14 24 104 114 124 0.1935484 0.1228070 0.03846154 3
5 5 15 25 105 115 125 0.2000000 0.1304348 0.04761905 3
6 6 16 26 106 116 126 0.2063492 0.1379310 0.05660377 3
7 7 17 27 107 117 127 0.2125984 0.1452991 0.06542056 3
8 8 18 28 108 118 128 0.2187500 0.1525424 0.07407407 3
9 9 19 29 109 119 129 0.2248062 0.1596639 0.08256881 3
10 10 20 30 110 120 130 0.2307692 0.1666667 0.09090909 3
This should be extensible to any number of a* and b* variables, assuming
that the length of avars and bvars must be the same, because it's not
checked for above...nor is it checked that any value among the b* variables
is zero.
HTH,
Dennis
On Fri, Mar 4, 2011 at 4:37 PM, Al Roark <hrbuilder@hotmail.com> wrote:
>
> Hi all,
>
> I frequently encounter datasets that require me to repeat the same
> calculation across many variables. For example, given a dataset with total
> employment variables and manufacturing employment variables for the years
> 1990-2010, I might have to calculate manufacturing's share of total
> employment in each year. I find it cumbersome to have to manually define a
> share for each year and would like to know how others might handle this
kind
> of task.
>
> For example, given the data frame:
>
> df<-data.frame(a1=1:10, a2=11:20, a3=21:30, b1=101:110, b2=111:120,
> b3=121:130)
>
> I'd like to append new variables--c1, c2, and c3--to the data frame
that
> are the result of a1/b1, a2/b2, and a3/b3, respectively.
>
> When there are only a few of these variables, I don't really have a
> problem, but it becomes a chore when the number of variables increases. Is
> there a way I can do this kind of processing using a loop? I tried defining
> a vector to hold the names for the "c variables" (e.g. c1,c2, ...
cn) and
> creating new variables in a loop using code like:
>
> avars<-c("a1","a2","a3")
> bvars<-c("b1","b2","b3")
> cvars<-c("c1","c2","c3")
> for(i in 1:3){
> df$cvars[i]<-df$avars[i]/df$bvars[i]
> }
>
> But the variable references don't resolve properly with this particular
> syntax.
>
> Any help would be much appreciated. Cheers.
> [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]