Ben Fairbank
2007-Jan-19 17:54 UTC
[R] Newbie question: Statistical functions (e.g., mean, sd) in a "transform" statement?
Greetings listeRs - Given a data frame such as times time1 time2 time3 time4 1 70.408543 48.92378 7.399605 95.93050 2 17.231940 27.48530 82.962916 10.20619 3 20.279220 10.33575 66.209290 30.71846 4 NA 53.31993 12.398237 35.65782 5 9.295965 NA 48.929201 NA 6 63.966518 42.16304 1.777342 NA one can use "transform" to total all or some columns, thus, times2 <- transform(times,totaltime=time1+time2+time3+time4)> times2time1 time2 time3 time4 totaltime 1 70.408543 48.92378 7.399605 95.93050 222.6624 2 17.231940 27.48530 82.962916 10.20619 137.8863 3 20.279220 10.33575 66.209290 30.71846 127.5427 4 NA 53.31993 12.398237 35.65782 NA 5 9.295965 NA 48.929201 NA NA 6 63.966518 42.16304 1.777342 NA NA I cannot, however, find a way, other than "for" looping, to use statistical functions, such as mean or sd, to compute the new column. For example,>times2<-transform(times,meantime=(mean(c(time1,time2,time3,time4),na.rmTRUE)))> times2time1 time2 time3 time4 meantime 1 70.408543 48.92378 7.399605 95.93050 45.54178 2 17.231940 27.48530 82.962916 10.20619 45.54178 3 20.279220 10.33575 66.209290 30.71846 45.54178 4 NA 53.31993 12.398237 35.65782 45.54178 5 9.295965 NA 48.929201 NA 45.54178 6 63.966518 42.16304 1.777342 NA 45.54178 How can this be done? And, generally, what is the recommended method for creating computed new columns in data frames when "for" loops take too long? With thanks for any suggestions, Ben Fairbank Using version 2.4.1 on a Windows XP professional operating system. [[alternative HTML version deleted]]
Charles C. Berry
2007-Jan-19 18:36 UTC
[R] Newbie question: Statistical functions (e.g., mean, sd) in a "transform" statement?
Ben, transform() is probably the wrong tool if what you want is to 'apply a function' to the corresponding elements of time1, time2, ... , and return a vector of results. If this is what you are after, the 'apply' family of functions is what you want. See ?apply and ?mapply and the 'See Also's on each page. Chuck Berry On Fri, 19 Jan 2007, Ben Fairbank wrote:> Greetings listeRs - > > > > Given a data frame such as > > > > times > > time1 time2 time3 time4 > > 1 70.408543 48.92378 7.399605 95.93050 > > 2 17.231940 27.48530 82.962916 10.20619 > > 3 20.279220 10.33575 66.209290 30.71846 > > 4 NA 53.31993 12.398237 35.65782 > > 5 9.295965 NA 48.929201 NA > > 6 63.966518 42.16304 1.777342 NA > > > > one can use "transform" to total all or some columns, thus, > > > > times2 <- transform(times,totaltime=time1+time2+time3+time4) > > > >> times2 > > time1 time2 time3 time4 totaltime > > 1 70.408543 48.92378 7.399605 95.93050 222.6624 > > 2 17.231940 27.48530 82.962916 10.20619 137.8863 > > 3 20.279220 10.33575 66.209290 30.71846 127.5427 > > 4 NA 53.31993 12.398237 35.65782 NA > > 5 9.295965 NA 48.929201 NA NA > > 6 63.966518 42.16304 1.777342 NA NA > > > > I cannot, however, find a way, other than "for" looping, > > to use statistical functions, such as mean or sd, to > > compute the new column. For example, > > > >> > times2<-transform(times,meantime=(mean(c(time1,time2,time3,time4),na.rm> TRUE))) > > > >> times2 > > > > time1 time2 time3 time4 meantime > > 1 70.408543 48.92378 7.399605 95.93050 45.54178 > > 2 17.231940 27.48530 82.962916 10.20619 45.54178 > > 3 20.279220 10.33575 66.209290 30.71846 45.54178 > > 4 NA 53.31993 12.398237 35.65782 45.54178 > > 5 9.295965 NA 48.929201 NA 45.54178 > > 6 63.966518 42.16304 1.777342 NA 45.54178 > > > > How can this be done? And, generally, what is the recommended method > > for creating computed new columns in data frames when "for" loops take > > too long? > > > > With thanks for any suggestions, > > > > Ben Fairbank > > > > Using version 2.4.1 on a Windows XP professional operating system. > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:cberry at tajo.ucsd.edu UC San Diego http://biostat.ucsd.edu/~cberry/ La Jolla, San Diego 92093-0901
Michael Kubovy
2007-Jan-19 18:46 UTC
[R] Newbie question: Statistical functions (e.g., mean, sd) in a "transform" statement?
On Jan 19, 2007, at 12:54 PM, Ben Fairbank wrote:> Given a data frame such as > > times > > time1 time2 time3 time4 > 1 70.408543 48.92378 7.399605 95.93050 > 2 17.231940 27.48530 82.962916 10.20619 > 3 20.279220 10.33575 66.209290 30.71846 > 4 NA 53.31993 12.398237 35.65782 > 5 9.295965 NA 48.929201 NA > 6 63.966518 42.16304 1.777342 NA > > I cannot, however, find a way, other than "for" looping, > to use statistical functions, such as mean or sd, to > compute the new column.times <- data.frame(time1 = rnorm(6, 50, 20), time2 = rnorm(6, 40, 15), time3 = rnorm(6, 60, 25), time4 = rnorm(6, 55, 23)) times[4,1] <- NA times[5, c(2, 4)] <- NA times[6, 4] <- NA times$totaltime <- apply(times, 1, sum, na.rm = T) times$meantime <- apply(times, 1, mean, na.rm = T) times$sdtime <- apply(times, 1, sd, na.rm = T) time1 time2 time3 time4 totaltime meantime sdtime 1 28.84859 29.94037 92.11518 71.80472 222.70886 89.08354 71.11911 2 50.72260 39.02439 61.18364 31.63962 182.57024 73.02810 55.68944 3 11.75829 28.61262 72.37066 79.23817 191.97974 76.79189 62.99902 4 NA 27.23659 75.69952 38.19262 141.12872 70.56436 44.52787 5 31.05109 NA 52.41755 NA 83.46864 55.64576 21.52078 6 54.01291 52.48922 53.97689 NA 160.47902 80.23951 46.33038 _____________________________ Professor Michael Kubovy University of Virginia Department of Psychology USPS: P.O.Box 400400 Charlottesville, VA 22904-4400 Parcels: Room 102 Gilmer Hall McCormick Road Charlottesville, VA 22903 Office: B011 +1-434-982-4729 Lab: B019 +1-434-982-4751 Fax: +1-434-982-4766 WWW: http://www.people.virginia.edu/~mk9y/ [[alternative HTML version deleted]]
Gabor Grothendieck
2007-Jan-19 18:51 UTC
[R] Newbie question: Statistical functions (e.g., mean, sd) in a "transform" statement?
Try this using the builtin data set anscombe: transform(anscombe, rowMeans = rowMeans(anscombe)) On 1/19/07, Ben Fairbank <BEN at ssanet.com> wrote:> Greetings listeRs - > > > > Given a data frame such as > > > > times > > time1 time2 time3 time4 > > 1 70.408543 48.92378 7.399605 95.93050 > > 2 17.231940 27.48530 82.962916 10.20619 > > 3 20.279220 10.33575 66.209290 30.71846 > > 4 NA 53.31993 12.398237 35.65782 > > 5 9.295965 NA 48.929201 NA > > 6 63.966518 42.16304 1.777342 NA > > > > one can use "transform" to total all or some columns, thus, > > > > times2 <- transform(times,totaltime=time1+time2+time3+time4) > > > > > times2 > > time1 time2 time3 time4 totaltime > > 1 70.408543 48.92378 7.399605 95.93050 222.6624 > > 2 17.231940 27.48530 82.962916 10.20619 137.8863 > > 3 20.279220 10.33575 66.209290 30.71846 127.5427 > > 4 NA 53.31993 12.398237 35.65782 NA > > 5 9.295965 NA 48.929201 NA NA > > 6 63.966518 42.16304 1.777342 NA NA > > > > I cannot, however, find a way, other than "for" looping, > > to use statistical functions, such as mean or sd, to > > compute the new column. For example, > > > > > > times2<-transform(times,meantime=(mean(c(time1,time2,time3,time4),na.rm> TRUE))) > > > > > times2 > > > > time1 time2 time3 time4 meantime > > 1 70.408543 48.92378 7.399605 95.93050 45.54178 > > 2 17.231940 27.48530 82.962916 10.20619 45.54178 > > 3 20.279220 10.33575 66.209290 30.71846 45.54178 > > 4 NA 53.31993 12.398237 35.65782 45.54178 > > 5 9.295965 NA 48.929201 NA 45.54178 > > 6 63.966518 42.16304 1.777342 NA 45.54178 > > > > How can this be done? And, generally, what is the recommended method > > for creating computed new columns in data frames when "for" loops take > > too long? > > > > With thanks for any suggestions, > > > > Ben Fairbank > > > > Using version 2.4.1 on a Windows XP professional operating system. > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Gavin Simpson
2007-Jan-19 18:53 UTC
[R] Newbie question: Statistical functions (e.g., mean, sd) in a "transform" statement?
On Fri, 2007-01-19 at 11:54 -0600, Ben Fairbank wrote:> Greetings listeRs -Here are two solutions, depending on whether you wanted the NA's or not, and I assume you wanted the row means:> times3 <- transform(times, meantime = rowMeans(times)) > times3time1 time2 time3 time4 meantime 1 70.408543 48.92378 7.399605 95.93050 55.66561 2 17.231940 27.48530 82.962916 10.20619 34.47159 3 20.279220 10.33575 66.209290 30.71846 31.88568 4 NA 53.31993 12.398237 35.65782 NA 5 9.295965 NA 48.929201 NA NA 6 63.966518 42.16304 1.777342 NA NA> times4 <- transform(times, meantime = rowMeans(times, na.rm = TRUE)) > times4time1 time2 time3 time4 meantime 1 70.408543 48.92378 7.399605 95.93050 55.66561 2 17.231940 27.48530 82.962916 10.20619 34.47159 3 20.279220 10.33575 66.209290 30.71846 31.88568 4 NA 53.31993 12.398237 35.65782 33.79200 5 9.295965 NA 48.929201 NA 29.11258 6 63.966518 42.16304 1.777342 NA 35.96897 HTH G> > Given a data frame such as > > > > times > > time1 time2 time3 time4 > > 1 70.408543 48.92378 7.399605 95.93050 > > 2 17.231940 27.48530 82.962916 10.20619 > > 3 20.279220 10.33575 66.209290 30.71846 > > 4 NA 53.31993 12.398237 35.65782 > > 5 9.295965 NA 48.929201 NA > > 6 63.966518 42.16304 1.777342 NA > > > > one can use "transform" to total all or some columns, thus, > > > > times2 <- transform(times,totaltime=time1+time2+time3+time4) > > > > > times2 > > time1 time2 time3 time4 totaltime > > 1 70.408543 48.92378 7.399605 95.93050 222.6624 > > 2 17.231940 27.48530 82.962916 10.20619 137.8863 > > 3 20.279220 10.33575 66.209290 30.71846 127.5427 > > 4 NA 53.31993 12.398237 35.65782 NA > > 5 9.295965 NA 48.929201 NA NA > > 6 63.966518 42.16304 1.777342 NA NA > > > > I cannot, however, find a way, other than "for" looping, > > to use statistical functions, such as mean or sd, to > > compute the new column. For example, > > > > > > times2<-transform(times,meantime=(mean(c(time1,time2,time3,time4),na.rm> TRUE))) > > > > > times2 > > > > time1 time2 time3 time4 meantime > > 1 70.408543 48.92378 7.399605 95.93050 45.54178 > > 2 17.231940 27.48530 82.962916 10.20619 45.54178 > > 3 20.279220 10.33575 66.209290 30.71846 45.54178 > > 4 NA 53.31993 12.398237 35.65782 45.54178 > > 5 9.295965 NA 48.929201 NA 45.54178 > > 6 63.966518 42.16304 1.777342 NA 45.54178 > > > > How can this be done? And, generally, what is the recommended method > > for creating computed new columns in data frames when "for" loops take > > too long? > > > > With thanks for any suggestions, > > > > Ben Fairbank > > > > Using version 2.4.1 on a Windows XP professional operating system. > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~% Gavin Simpson [t] +44 (0)20 7679 0522 ECRC, UCL Geography, [f] +44 (0)20 7679 0565 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk Gower Street, London [w] http://www.ucl.ac.uk/~ucfagls/ UK. WC1E 6BT. [w] http://www.freshwaters.org.uk %~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%