Al Roark
2011-Mar-05 00:37 UTC
[R] Repeating the same calculation across multiple pairs of variables
Hi all, I frequently encounter datasets that require me to repeat the same calculation across many variables. For example, given a dataset with total employment variables and manufacturing employment variables for the years 1990-2010, I might have to calculate manufacturing's share of total employment in each year. I find it cumbersome to have to manually define a share for each year and would like to know how others might handle this kind of task. For example, given the data frame: df<-data.frame(a1=1:10, a2=11:20, a3=21:30, b1=101:110, b2=111:120, b3=121:130) I'd like to append new variables--c1, c2, and c3--to the data frame that are the result of a1/b1, a2/b2, and a3/b3, respectively. When there are only a few of these variables, I don't really have a problem, but it becomes a chore when the number of variables increases. Is there a way I can do this kind of processing using a loop? I tried defining a vector to hold the names for the "c variables" (e.g. c1,c2, ... cn) and creating new variables in a loop using code like: avars<-c("a1","a2","a3") bvars<-c("b1","b2","b3") cvars<-c("c1","c2","c3") for(i in 1:3){ df$cvars[i]<-df$avars[i]/df$bvars[i] } But the variable references don't resolve properly with this particular syntax. Any help would be much appreciated. Cheers. [[alternative HTML version deleted]]
Joshua Wiley
2011-Mar-05 03:00 UTC
[R] Repeating the same calculation across multiple pairs of variables
Hi Al, Assuming that the order of the matrices resulting from selecting "avars" and "bvars" is identical (it is at least in the example you gave), then you can do: dat <- data.frame(a1=1:10, a2=11:20, a3=21:30, b1=101:110, b2=111:120, b3=121:130) avars <- paste("a", 1:3, sep = '') bvars <- paste("b", 1:3, sep = '') cvars <- paste("c", 1:3, sep = '') dat[, cvars] <- dat[, avars] / dat[, bvars] If you are using character strings for the names, you need to use [ rather than $. For documentation, see ?"[" Hope this helps, Josh On Fri, Mar 4, 2011 at 4:37 PM, Al Roark <hrbuilder at hotmail.com> wrote:> > Hi all, > > I frequently encounter datasets that require me to repeat the same calculation across many variables. For example, given a dataset with total employment variables and manufacturing employment variables for the years 1990-2010, I might have to calculate manufacturing's share of total employment in each year. I find it cumbersome to have to manually define a share for each year and would like to know how others might handle this kind of task. > > For example, given the data frame: > > df<-data.frame(a1=1:10, a2=11:20, a3=21:30, b1=101:110, b2=111:120, b3=121:130) > > I'd like to append new variables--c1, c2, and c3--to the data frame that are the result of a1/b1, a2/b2, and a3/b3, respectively. > > When there are only a few of these variables, I don't really have a problem, but it becomes a chore when the number of variables increases. Is there a way I can do this kind of processing using a loop? I tried defining a vector to hold the names for the "c variables" (e.g. c1,c2, ... cn) and creating new variables in a loop using code like: > > avars<-c("a1","a2","a3") > bvars<-c("b1","b2","b3") > cvars<-c("c1","c2","c3") > for(i in 1:3){ > ?df$cvars[i]<-df$avars[i]/df$bvars[i] > } > > But the variable references don't resolve properly with this particular syntax. > > Any help would be much appreciated. Cheers. > ? ? ? ?[[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Joshua Wiley Ph.D. Student, Health Psychology University of California, Los Angeles http://www.joshuawiley.com/
Dennis Murphy
2011-Mar-05 05:33 UTC
[R] Repeating the same calculation across multiple pairs of variables
Hi: Perhaps you had something like this in mind: df<-data.frame(a1=1:10, a2=11:20, a3=21:30, b1=101:110, b2=111:120, b3=121:130) avars <- names(df)[grep('^a', names(df))] bvars <- names(df)[grep('^b', names(df))] cvars <- paste('c', 1:length(avars), sep = '') df <- within(df, for(i in seq_along(cvars)) assign(cvars[i], get(avars[i])/get(bvars[i])))> dfa1 a2 a3 b1 b2 b3 c3 c2 c1 i 1 1 11 21 101 111 121 0.1735537 0.0990991 0.00990099 3 2 2 12 22 102 112 122 0.1803279 0.1071429 0.01960784 3 3 3 13 23 103 113 123 0.1869919 0.1150442 0.02912621 3 4 4 14 24 104 114 124 0.1935484 0.1228070 0.03846154 3 5 5 15 25 105 115 125 0.2000000 0.1304348 0.04761905 3 6 6 16 26 106 116 126 0.2063492 0.1379310 0.05660377 3 7 7 17 27 107 117 127 0.2125984 0.1452991 0.06542056 3 8 8 18 28 108 118 128 0.2187500 0.1525424 0.07407407 3 9 9 19 29 109 119 129 0.2248062 0.1596639 0.08256881 3 10 10 20 30 110 120 130 0.2307692 0.1666667 0.09090909 3 This should be extensible to any number of a* and b* variables, assuming that the length of avars and bvars must be the same, because it's not checked for above...nor is it checked that any value among the b* variables is zero. HTH, Dennis On Fri, Mar 4, 2011 at 4:37 PM, Al Roark <hrbuilder@hotmail.com> wrote:> > Hi all, > > I frequently encounter datasets that require me to repeat the same > calculation across many variables. For example, given a dataset with total > employment variables and manufacturing employment variables for the years > 1990-2010, I might have to calculate manufacturing's share of total > employment in each year. I find it cumbersome to have to manually define a > share for each year and would like to know how others might handle this kind > of task. > > For example, given the data frame: > > df<-data.frame(a1=1:10, a2=11:20, a3=21:30, b1=101:110, b2=111:120, > b3=121:130) > > I'd like to append new variables--c1, c2, and c3--to the data frame that > are the result of a1/b1, a2/b2, and a3/b3, respectively. > > When there are only a few of these variables, I don't really have a > problem, but it becomes a chore when the number of variables increases. Is > there a way I can do this kind of processing using a loop? I tried defining > a vector to hold the names for the "c variables" (e.g. c1,c2, ... cn) and > creating new variables in a loop using code like: > > avars<-c("a1","a2","a3") > bvars<-c("b1","b2","b3") > cvars<-c("c1","c2","c3") > for(i in 1:3){ > df$cvars[i]<-df$avars[i]/df$bvars[i] > } > > But the variable references don't resolve properly with this particular > syntax. > > Any help would be much appreciated. Cheers. > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]