Ogbos Okike
2018-Nov-28 04:15 UTC
[R] Applying a certain formula to a repeated sample data
Dear Jim, I wish also to use the means calculated and apply a certain formula on the same data frame. In particular, I would like to subtract the means of each of these seven days from each of the seven days and and divide the outcome by the same means. If I represent m1 by the means of each seven days in column 1, and c1 is taken as column 1 data. My formula will be of the form: aa<-(c1-m1)/m1. I tried it on the first 7 rows and I have what I am looking for.: -0.0089986156 -0.0031149054 0.0042685741 0.0094600831 0.0019612367 -0.0044993078 0.0009229349 But doing it manually will take much time. Many thanks for going a step further to assist me. Warmest regards. Ogbos On Wed, Nov 28, 2018 at 4:31 AM Jim Lemon <drjimlemon at gmail.com> wrote:> Hi Ogbos, > If we assume that you have a 3 column data frame named oodf, how about: > > oodf[,4]<-floor((cumsum(oodf[,1])-1)/28) > col2means<-by(oodf[,2],oodf[,4],mean) > col3means<-by(oodf[,3],oodf[,4],mean) > > Jim > > On Wed, Nov 28, 2018 at 2:06 PM Ogbos Okike <giftedlife2014 at gmail.com> > wrote: > > > > Dear List, > > I have three data-column data. The data is of the form: > > 1 8590 12516 > > 2 8641 98143 > > 3 8705 98916 > > 4 8750 89911 > > 5 8685 104835 > > 6 8629 121963 > > 7 8676 77655 > > 1 8577 81081 > > 2 8593 83385 > > 3 8642 112164 > > 4 8708 103684 > > 5 8622 83982 > > 6 8593 75944 > > 7 8600 97036 > > 1 8650 104911 > > 2 8730 114098 > > 3 8731 99421 > > 4 8715 85707 > > 5 8717 81273 > > 6 8739 106462 > > 7 8684 110635 > > 1 8713 105214 > > 2 8771 92456 > > 3 8759 109270 > > 4 8762 99150 > > 5 8730 77306 > > 6 8780 86324 > > 7 8804 90214 > > 1 8797 99894 > > 2 8863 95177 > > 3 8873 95910 > > 4 8827 108511 > > 5 8806 115636 > > 6 8869 85542 > > 7 8854 111018 > > 1 8571 93247 > > 2 8533 85105 > > 3 8553 114725 > > 4 8561 122195 > > 5 8532 100945 > > 6 8560 108552 > > 7 8634 108707 > > 1 8646 117420 > > 2 8633 113823 > > 3 8680 82763 > > 4 8765 121072 > > 5 8756 89835 > > 6 8750 104578 > > 7 8790 88429 > > > > I wish to calculate average of the second and third columns based on the > > first column for each repeated 7 days. The length of the data is 1442. > That > > is 206 by 7. So I should arrive at 207 data points for each of the two > > columns after calculating the mean of each group 1-7. > > > > I have both tried factor/tapply and aggregate functions but seem not to > be > > making progress. > > > > Thank you very much for your idea. > > > > Best wishes > > Ogbos > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Ogbos Okike
2018-Nov-28 05:17 UTC
[R] Applying a certain formula to a repeated sample data
Dear Jim, I don't think my problem is clear the way I put. I have been trying to manually apply the formula to some rows. This is what I have done. I cut and past some rows from 1-7 and save each with a different file as shown below: 1 8590 12516 2 8641 98143 3 8705 98916 4 8750 89911 5 8685 104835 6 8629 121963 7 8676 77655 1 8577 81081 2 8593 83385 3 8642 112164 4 8708 103684 5 8622 83982 6 8593 75944 7 8600 97036 1 8650 104911 2 8730 114098 3 8731 99421 4 8715 85707 5 8717 81273 6 8739 106462 7 8684 110635 1 8713 105214 2 8771 92456 3 8759 109270 4 8762 99150 5 8730 77306 6 8780 86324 7 8804 90214 1 8797 99894 2 8863 95177 3 8873 95910 4 8827 108511 5 8806 115636 6 8869 85542 7 8854 111018 1 8571 93247 2 8533 85105 3 8553 114725 4 8561 122195 5 8532 100945 6 8560 108552 7 8634 108707 1 8646 117420 2 8633 113823 3 8680 82763 4 8765 121072 5 8756 89835 6 8750 104578 7 8790 88429 Each of them are then read as: d1<-read.table("dat1",col.names=c("n","CR","WW")) d2<-read.table("dat2",col.names=c("n","CR","WW")) d3<-read.table("dat3",col.names=c("n","CR","WW")) d4<-read.table("dat4",col.names=c("n","CR","WW")) d5<-read.table("dat5",col.names=c("n","CR","WW")) d6<-read.table("dat6",col.names=c("n","CR","WW")) d7<-read.table("dat7",col.names=c("n","CR","WW")) And my formula for percentage change applied as follows for column 2: a1<-((d1$CR-mean(d1$CR))/mean(CR))*100 a2<-((d2$CR-mean(d2$CR))/mean(CR))*100 a3<-((d3$CR-mean(d3$CR))/mean(CR))*100 a4<-((d4$CR-mean(d4$CR))/mean(CR))*100 a5<-((d5$CR-mean(d5$CR))/mean(CR))*100 a6<-((d6$CR-mean(d6$CR))/mean(CR))*100 a7<-((d7$CR-mean(d7$CR))/mean(CR))*100 a1-a7 actually gives percentage change in the data. Instead of doing this one after the other, can you please give an indication on how I may apply this formula to the data frame with probably a code. Thank you again. Best Ogbos On Wed, Nov 28, 2018 at 5:15 AM Ogbos Okike <giftedlife2014 at gmail.com> wrote:> Dear Jim, > > I wish also to use the means calculated and apply a certain formula on > the same data frame. In particular, I would like to subtract the means of > each of these seven days from each of the seven days and and divide the > outcome by the same means. If I represent m1 by the means of each seven > days in column 1, and c1 is taken as column 1 data. My formula will be of > the form: > aa<-(c1-m1)/m1. > > I tried it on the first 7 rows and I have what I am looking for.: > -0.0089986156 > -0.0031149054 > 0.0042685741 > 0.0094600831 > 0.0019612367 > -0.0044993078 > 0.0009229349 > > But doing it manually will take much time. > > Many thanks for going a step further to assist me. > > Warmest regards. > Ogbos > > On Wed, Nov 28, 2018 at 4:31 AM Jim Lemon <drjimlemon at gmail.com> wrote: > >> Hi Ogbos, >> If we assume that you have a 3 column data frame named oodf, how about: >> >> oodf[,4]<-floor((cumsum(oodf[,1])-1)/28) >> col2means<-by(oodf[,2],oodf[,4],mean) >> col3means<-by(oodf[,3],oodf[,4],mean) >> >> Jim >> >> On Wed, Nov 28, 2018 at 2:06 PM Ogbos Okike <giftedlife2014 at gmail.com> >> wrote: >> > >> > Dear List, >> > I have three data-column data. The data is of the form: >> > 1 8590 12516 >> > 2 8641 98143 >> > 3 8705 98916 >> > 4 8750 89911 >> > 5 8685 104835 >> > 6 8629 121963 >> > 7 8676 77655 >> > 1 8577 81081 >> > 2 8593 83385 >> > 3 8642 112164 >> > 4 8708 103684 >> > 5 8622 83982 >> > 6 8593 75944 >> > 7 8600 97036 >> > 1 8650 104911 >> > 2 8730 114098 >> > 3 8731 99421 >> > 4 8715 85707 >> > 5 8717 81273 >> > 6 8739 106462 >> > 7 8684 110635 >> > 1 8713 105214 >> > 2 8771 92456 >> > 3 8759 109270 >> > 4 8762 99150 >> > 5 8730 77306 >> > 6 8780 86324 >> > 7 8804 90214 >> > 1 8797 99894 >> > 2 8863 95177 >> > 3 8873 95910 >> > 4 8827 108511 >> > 5 8806 115636 >> > 6 8869 85542 >> > 7 8854 111018 >> > 1 8571 93247 >> > 2 8533 85105 >> > 3 8553 114725 >> > 4 8561 122195 >> > 5 8532 100945 >> > 6 8560 108552 >> > 7 8634 108707 >> > 1 8646 117420 >> > 2 8633 113823 >> > 3 8680 82763 >> > 4 8765 121072 >> > 5 8756 89835 >> > 6 8750 104578 >> > 7 8790 88429 >> > >> > I wish to calculate average of the second and third columns based on the >> > first column for each repeated 7 days. The length of the data is 1442. >> That >> > is 206 by 7. So I should arrive at 207 data points for each of the two >> > columns after calculating the mean of each group 1-7. >> > >> > I have both tried factor/tapply and aggregate functions but seem not to >> be >> > making progress. >> > >> > Thank you very much for your idea. >> > >> > Best wishes >> > Ogbos >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >[[alternative HTML version deleted]]
Jeff Newmiller
2018-Nov-28 06:10 UTC
[R] Applying a certain formula to a repeated sample data
Thank you for providing a clarifying example. I think a useful function for you to get familiar with is the "ave" function. It is kind of like aggregate except that it works when the operation you want to apply to the group of elements will returns the same number of elements as were given to it. Also, in the future please figure out how to tell gmail to send plain text to the mailing list instead of HTML. You were lucky this time, but often HTML email gets horribly mangled as it goes through the mailing list and gets all the formatting removed. ############################### dta <- read.table( text "n CR WW 1 8590 12516 2 8641 98143 3 8705 98916 4 8750 89911 5 8685 104835 6 8629 121963 7 8676 77655 1 8577 81081 2 8593 83385 3 8642 112164 4 8708 103684 5 8622 83982 6 8593 75944 7 8600 97036 1 8650 104911 2 8730 114098 3 8731 99421 4 8715 85707 5 8717 81273 6 8739 106462 7 8684 110635 1 8713 105214 2 8771 92456 3 8759 109270 4 8762 99150 5 8730 77306 6 8780 86324 7 8804 90214 1 8797 99894 2 8863 95177 3 8873 95910 4 8827 108511 5 8806 115636 6 8869 85542 7 8854 111018 1 8571 93247 2 8533 85105 3 8553 114725 4 8561 122195 5 8532 100945 6 8560 108552 7 8634 108707 1 8646 117420 2 8633 113823 3 8680 82763 4 8765 121072 5 8756 89835 6 8750 104578 7 8790 88429 ",header=TRUE) # one way to make a grouping vector dta$G <- cumsum( c( 1, diff( dta$n ) < 0 ) ) # your operation fn <- function( x ) { m <- mean( x ) ( x - m ) / m * 100 } # your operation, computing for each group gn <- function( x, g ) { ave( x, g, FUN = fn ) } # do the computations dta$CRpct <- gn( dta$CR, dta$G ) dta$WWpct <- gn( dta$WW, dta$G ) dta #> n CR WW G CRpct WWpct #> 1 1 8590 12516 1 -0.899861560 -85.4932369 #> 2 2 8641 98143 1 -0.311490540 13.7533758 #> 3 3 8705 98916 1 0.426857407 14.6493272 #> 4 4 8750 89911 1 0.946008306 4.2120148 #> 5 5 8685 104835 1 0.196123673 21.5097882 #> 6 6 8629 121963 1 -0.449930780 41.3621243 #> 7 7 8676 77655 1 0.092293493 -9.9933934 #> 8 1 8577 81081 2 -0.490594182 -10.9385886 #> 9 2 8593 83385 2 -0.304963951 -8.4078170 #> 10 3 8642 112164 2 0.263528632 23.2037610 #> 11 4 8708 103684 2 1.029253336 13.8891155 #> 12 5 8622 83982 2 0.031490843 -7.7520572 #> 13 6 8593 75944 2 -0.304963951 -16.5811987 #> 14 7 8600 97036 2 -0.223750725 6.5867850 #> 15 1 8650 104911 3 -0.682347538 4.5366096 #> 16 2 8730 114098 3 0.236197225 13.6908244 #> 17 3 8731 99421 3 0.247679034 -0.9337985 #> 18 4 8715 85707 3 0.063970082 -14.5988581 #> 19 5 8717 81273 3 0.086933701 -19.0170347 #> 20 6 8739 106462 3 0.339533510 6.0820746 #> 21 7 8684 110635 3 -0.291966014 10.2401827 #> 22 1 8713 105214 4 -0.534907614 11.6017662 #> 23 2 8771 92456 4 0.127203640 -1.9307991 #> 24 3 8759 109270 4 -0.009784895 15.9040146 #> 25 4 8762 99150 4 0.024462238 5.1696079 #> 26 5 8730 77306 4 -0.340840523 -18.0005879 #> 27 6 8780 86324 4 0.229945042 -8.4350859 #> 28 7 8804 90214 4 0.503922112 -4.3089157 #> 29 1 8797 99894 5 -0.500896767 -1.7465519 #> 30 2 8863 95177 5 0.245600995 -6.3860849 #> 31 3 8873 95910 5 0.358706717 -5.6651229 #> 32 4 8827 108511 5 -0.161579602 6.7289318 #> 33 5 8806 115636 5 -0.399101617 13.7369184 #> 34 6 8869 85542 5 0.313464428 -15.8628500 #> 35 7 8854 111018 5 0.143805846 9.1947595 #> 36 1 8571 93247 6 0.088415855 -11.0088128 #> 37 2 8533 85105 6 -0.355331643 -18.7792102 #> 38 3 8553 114725 6 -0.121780328 9.4889267 #> 39 4 8561 122195 6 -0.028359802 16.6179943 #> 40 5 8532 100945 6 -0.367009209 -3.6621512 #> 41 6 8560 108552 6 -0.040037368 3.5976637 #> 42 7 8634 108707 6 0.824102496 3.7455895 #> 43 1 8646 117420 7 -0.816125860 14.4890796 #> 44 2 8633 113823 7 -0.965257293 10.9818643 #> 45 3 8680 82763 7 -0.426089807 -19.3028471 #> 46 4 8765 121072 7 0.549000328 18.0499220 #> 47 5 8756 89835 7 0.445755490 -12.4073713 #> 48 6 8750 104578 7 0.376925598 1.9676287 #> 49 7 8790 88429 7 0.835791544 -13.7782761 #' Created on 2018-11-27 by the [reprex package](http://reprex.tidyverse.org) (v0.2.0). ############################### On Wed, 28 Nov 2018, Ogbos Okike wrote:> Dear Jim, > > I don't think my problem is clear the way I put. > > I have been trying to manually apply the formula to some rows. > > This is what I have done. > I cut and past some rows from 1-7 and save each with a different file as > shown below: > > 1 8590 12516 > 2 8641 98143 > 3 8705 98916 > 4 8750 89911 > 5 8685 104835 > 6 8629 121963 > 7 8676 77655 > > > 1 8577 81081 > 2 8593 83385 > 3 8642 112164 > 4 8708 103684 > 5 8622 83982 > 6 8593 75944 > 7 8600 97036 > > > 1 8650 104911 > 2 8730 114098 > 3 8731 99421 > 4 8715 85707 > 5 8717 81273 > 6 8739 106462 > 7 8684 110635 > > > 1 8713 105214 > 2 8771 92456 > 3 8759 109270 > 4 8762 99150 > 5 8730 77306 > 6 8780 86324 > 7 8804 90214 > > > 1 8797 99894 > 2 8863 95177 > 3 8873 95910 > 4 8827 108511 > 5 8806 115636 > 6 8869 85542 > 7 8854 111018 > > > 1 8571 93247 > 2 8533 85105 > 3 8553 114725 > 4 8561 122195 > 5 8532 100945 > 6 8560 108552 > 7 8634 108707 > > > 1 8646 117420 > 2 8633 113823 > 3 8680 82763 > 4 8765 121072 > 5 8756 89835 > 6 8750 104578 > 7 8790 88429 > > Each of them are then read as: > d1<-read.table("dat1",col.names=c("n","CR","WW")) > d2<-read.table("dat2",col.names=c("n","CR","WW")) > d3<-read.table("dat3",col.names=c("n","CR","WW")) > d4<-read.table("dat4",col.names=c("n","CR","WW")) > d5<-read.table("dat5",col.names=c("n","CR","WW")) > d6<-read.table("dat6",col.names=c("n","CR","WW")) > d7<-read.table("dat7",col.names=c("n","CR","WW")) > > And my formula for percentage change applied as follows for column 2: > a1<-((d1$CR-mean(d1$CR))/mean(CR))*100 > a2<-((d2$CR-mean(d2$CR))/mean(CR))*100 > a3<-((d3$CR-mean(d3$CR))/mean(CR))*100 > a4<-((d4$CR-mean(d4$CR))/mean(CR))*100 > a5<-((d5$CR-mean(d5$CR))/mean(CR))*100 > a6<-((d6$CR-mean(d6$CR))/mean(CR))*100 > a7<-((d7$CR-mean(d7$CR))/mean(CR))*100 > > a1-a7 actually gives percentage change in the data. > > Instead of doing this one after the other, can you please give an > indication on how I may apply this formula to the data frame with probably > a code. > > Thank you again. > > Best > Ogbos > > On Wed, Nov 28, 2018 at 5:15 AM Ogbos Okike <giftedlife2014 at gmail.com> > wrote: > >> Dear Jim, >> >> I wish also to use the means calculated and apply a certain formula on >> the same data frame. In particular, I would like to subtract the means of >> each of these seven days from each of the seven days and and divide the >> outcome by the same means. If I represent m1 by the means of each seven >> days in column 1, and c1 is taken as column 1 data. My formula will be of >> the form: >> aa<-(c1-m1)/m1. >> >> I tried it on the first 7 rows and I have what I am looking for.: >> -0.0089986156 >> -0.0031149054 >> 0.0042685741 >> 0.0094600831 >> 0.0019612367 >> -0.0044993078 >> 0.0009229349 >> >> But doing it manually will take much time. >> >> Many thanks for going a step further to assist me. >> >> Warmest regards. >> Ogbos >> >> On Wed, Nov 28, 2018 at 4:31 AM Jim Lemon <drjimlemon at gmail.com> wrote: >> >>> Hi Ogbos, >>> If we assume that you have a 3 column data frame named oodf, how about: >>> >>> oodf[,4]<-floor((cumsum(oodf[,1])-1)/28) >>> col2means<-by(oodf[,2],oodf[,4],mean) >>> col3means<-by(oodf[,3],oodf[,4],mean) >>> >>> Jim >>> >>> On Wed, Nov 28, 2018 at 2:06 PM Ogbos Okike <giftedlife2014 at gmail.com> >>> wrote: >>>> >>>> Dear List, >>>> I have three data-column data. The data is of the form: >>>> 1 8590 12516 >>>> 2 8641 98143 >>>> 3 8705 98916 >>>> 4 8750 89911 >>>> 5 8685 104835 >>>> 6 8629 121963 >>>> 7 8676 77655 >>>> 1 8577 81081 >>>> 2 8593 83385 >>>> 3 8642 112164 >>>> 4 8708 103684 >>>> 5 8622 83982 >>>> 6 8593 75944 >>>> 7 8600 97036 >>>> 1 8650 104911 >>>> 2 8730 114098 >>>> 3 8731 99421 >>>> 4 8715 85707 >>>> 5 8717 81273 >>>> 6 8739 106462 >>>> 7 8684 110635 >>>> 1 8713 105214 >>>> 2 8771 92456 >>>> 3 8759 109270 >>>> 4 8762 99150 >>>> 5 8730 77306 >>>> 6 8780 86324 >>>> 7 8804 90214 >>>> 1 8797 99894 >>>> 2 8863 95177 >>>> 3 8873 95910 >>>> 4 8827 108511 >>>> 5 8806 115636 >>>> 6 8869 85542 >>>> 7 8854 111018 >>>> 1 8571 93247 >>>> 2 8533 85105 >>>> 3 8553 114725 >>>> 4 8561 122195 >>>> 5 8532 100945 >>>> 6 8560 108552 >>>> 7 8634 108707 >>>> 1 8646 117420 >>>> 2 8633 113823 >>>> 3 8680 82763 >>>> 4 8765 121072 >>>> 5 8756 89835 >>>> 6 8750 104578 >>>> 7 8790 88429 >>>> >>>> I wish to calculate average of the second and third columns based on the >>>> first column for each repeated 7 days. The length of the data is 1442. >>> That >>>> is 206 by 7. So I should arrive at 207 data points for each of the two >>>> columns after calculating the mean of each group 1-7. >>>> >>>> I have both tried factor/tapply and aggregate functions but seem not to >>> be >>>> making progress. >>>> >>>> Thank you very much for your idea. >>>> >>>> Best wishes >>>> Ogbos >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k