Dear All: Urgent help is needed. I have a data set in matrix format of three columns: X, Y and index of four groups (1,2,3,4). What I need to do is the following; 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from the corresponding data values of this group and create new columns say X-sample mean and Y-sample mean? I tried to use the "tapply" but I have some difficulties to restore the new data 2- How I can use the ?tapply? if possible or any other R-function to find the correlation coefficient between the X and Y columns for each group indexed 1,2,3,4.? Could not use the "tapply". I attached part of the data as txt file. Thank you so much for your attention to this matter, and I look forward to hear from you soon. Regards, Abou Data: ===x y index 15807.24 12.5 4 15752.51 33.5 4 12893.76 01.5 3 8426.88 22.2 3 5706.24 333 3 3982.08 560 2 3642.62 670 2 295.68 124 1 215.40 104 1 195.40 204 1 4240.21 22.4 2 1222.72 45.9 2 1142.26 23.6 2 63.00 90.1 1 1216.00 82.4 2 2769.60 111 2 1790.46 34.7 2 26.10 26.10 1 19676.83 0.99 4 10920.60 203 3 6144.00 46 3 4534.48 4534.48 3 40000.00 65 4 29500.00 56 4 17100.00 77 4 9000.00 435 3 6300.00 84 3 3962.88 334 2 5690.00 653 3 3736.00 233 2 2750.00 22 2 1316.00 345 2 4595.00 4595.00 3 5928.00 45 3 2645.70 0.00 2 2580.24 454 2 6547.34 6547.34 3 1615.68 5 2 194.06 55 1 184.80 6 1 82.94 44 1 16649.00 56 4 4500.00 74 3 1600.00 744 2 ================ =========================AbouEl-Makarim Aboueissa, Ph.D. Assistant Professor of Statistics Department of Mathematics & Statistics University of Southern Maine 96 Falmouth Street P.O. Box 9300 Portland, ME 04104-9300 Tel: (207) 228-8389 Email: aaboueissa at usm.maine.edu aboueiss at yahoo.com Office: 301C Payson Smith -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: datatest.txt Url: https://stat.ethz.ch/pipermail/r-help/attachments/20070816/34988702/attachment.txt
try this: t0 = read.table("datatest.txt", header=T) X.mean = ave(t0[,1], as.factor(t0[,3])) you do the rest of Y.mean and make them into a data.fame or whatever. HTH, Weiwei On 8/16/07, AbouEl-Makarim Aboueissa <aaboueissa at usm.maine.edu> wrote:> Dear All: > > Urgent help is needed. > > > I have a data set in matrix format of three columns: X, Y and index of four groups (1,2,3,4). What I need to do is the following; > > 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from the > corresponding data values of this group and create new columns say X-sample mean > and Y-sample mean? I tried to use the "tapply" but I have some difficulties to restore the new data > > > 2- How I can use the "tapply" if possible or any other R-function to find the correlation > coefficient between the X and Y columns for each group indexed 1,2,3,4.? Could not use the "tapply". > > > I attached part of the data as txt file. > > > Thank you so much for your attention to this matter, and I look forward to hear from you soon. > > Regards, > > Abou > > > Data: > ===> x y index > 15807.24 12.5 4 > 15752.51 33.5 4 > 12893.76 01.5 3 > 8426.88 22.2 3 > 5706.24 333 3 > 3982.08 560 2 > 3642.62 670 2 > 295.68 124 1 > 215.40 104 1 > 195.40 204 1 > 4240.21 22.4 2 > 1222.72 45.9 2 > 1142.26 23.6 2 > 63.00 90.1 1 > 1216.00 82.4 2 > 2769.60 111 2 > 1790.46 34.7 2 > 26.10 26.10 1 > 19676.83 0.99 4 > 10920.60 203 3 > 6144.00 46 3 > 4534.48 4534.48 3 > 40000.00 65 4 > 29500.00 56 4 > 17100.00 77 4 > 9000.00 435 3 > 6300.00 84 3 > 3962.88 334 2 > 5690.00 653 3 > 3736.00 233 2 > 2750.00 22 2 > 1316.00 345 2 > 4595.00 4595.00 3 > 5928.00 45 3 > 2645.70 0.00 2 > 2580.24 454 2 > 6547.34 6547.34 3 > 1615.68 5 2 > 194.06 55 1 > 184.80 6 1 > 82.94 44 1 > 16649.00 56 4 > 4500.00 74 3 > 1600.00 744 2 > > ================> > > > =========================> AbouEl-Makarim Aboueissa, Ph.D. > Assistant Professor of Statistics > Department of Mathematics & Statistics > University of Southern Maine > 96 Falmouth Street > P.O. Box 9300 > Portland, ME 04104-9300 > > Tel: (207) 228-8389 > Email: aaboueissa at usm.maine.edu > aboueiss at yahoo.com > Office: 301C Payson Smith > > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >-- Weiwei Shi, Ph.D Research Scientist GeneGO, Inc. "Did you always know?" "No, I did not. But I believed..." ---Matrix III
For the 2nd item, perhaps: by(df[,1:2], df$index, FUN=cor) where df is your data.frame. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O On 16/08/07, AbouEl-Makarim Aboueissa <aaboueissa@usm.maine.edu> wrote:> > Dear All: > > Urgent help is needed. > > > I have a data set in matrix format of three columns: X, Y and index of > four groups (1,2,3,4). What I need to do is the following; > > 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from > the > corresponding data values of this group and create new columns say > X-sample mean > and Y-sample mean? I tried to use the "tapply" but I have some > difficulties to restore the new data > > > 2- How I can use the "tapply" if possible or any other R-function to find > the correlation > coefficient between the X and Y columns for each group indexed > 1,2,3,4.? Could not use the "tapply". > > > I attached part of the data as txt file. > > > Thank you so much for your attention to this matter, and I look forward to > hear from you soon. > > Regards, > > Abou > > > Data: > ===> x y index > 15807.24 12.5 4 > 15752.51 33.5 4 > 12893.76 01.5 3 > 8426.88 22.2 3 > 5706.24 333 3 > 3982.08 560 2 > 3642.62 670 2 > 295.68 124 1 > 215.40 104 1 > 195.40 204 1 > 4240.21 22.4 2 > 1222.72 45.9 2 > 1142.26 23.6 2 > 63.00 90.1 1 > 1216.00 82.4 2 > 2769.60 111 2 > 1790.46 34.7 2 > 26.10 26.10 1 > 19676.83 0.99 4 > 10920.60 203 3 > 6144.00 46 3 > 4534.48 4534.48 3 > 40000.00 65 4 > 29500.00 56 4 > 17100.00 77 4 > 9000.00 435 3 > 6300.00 84 3 > 3962.88 334 2 > 5690.00 653 3 > 3736.00 233 2 > 2750.00 22 2 > 1316.00 345 2 > 4595.00 4595.00 3 > 5928.00 45 3 > 2645.70 0.00 2 > 2580.24 454 2 > 6547.34 6547.34 3 > 1615.68 5 2 > 194.06 55 1 > 184.80 6 1 > 82.94 44 1 > 16649.00 56 4 > 4500.00 74 3 > 1600.00 744 2 > > ================> > > > =========================> AbouEl-Makarim Aboueissa, Ph.D. > Assistant Professor of Statistics > Department of Mathematics & Statistics > University of Southern Maine > 96 Falmouth Street > P.O. Box 9300 > Portland, ME 04104-9300 > > Tel: (207) 228-8389 > Email: aaboueissa@usm.maine.edu > aboueiss@yahoo.com > Office: 301C Payson Smith > > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >[[alternative HTML version deleted]]
On Thu, 2007-08-16 at 12:33 -0400, AbouEl-Makarim Aboueissa wrote:> Dear All: > > Urgent help is needed. > > > I have a data set in matrix format of three columns: X, Y and index > of four groups (1,2,3,4). What I need to do is the following; > > 1- How I can subtract the sample mean of each group indexed 1,2,3,4 > from the > corresponding data values of this group and create new columns > say X-sample mean > and Y-sample mean? I tried to use the "tapply" but I have some > difficulties to restore the new data > > > 2- How I can use the ?tapply? if possible or any other R-function to > find the correlation > coefficient between the X and Y columns for each group indexed > 1,2,3,4.? Could not use the "tapply". > > > I attached part of the data as txt file. > > > Thank you so much for your attention to this matter, and I look > forward to hear from you soon. > > Regards, > > Abou > > > Data: > ===> x y index > 15807.24 12.5 4 > 15752.51 33.5 4 > 12893.76 01.5 3 > 8426.88 22.2 3 > 5706.24 333 3 > 3982.08 560 2 > 3642.62 670 2 > 295.68 124 1 > 215.40 104 1 > 195.40 204 1 > 4240.21 22.4 2 > 1222.72 45.9 2 > 1142.26 23.6 2 > 63.00 90.1 1 > 1216.00 82.4 2 > 2769.60 111 2 > 1790.46 34.7 2 > 26.10 26.10 1 > 19676.83 0.99 4 > 10920.60 203 3 > 6144.00 46 3 > 4534.48 4534.48 3 > 40000.00 65 4 > 29500.00 56 4 > 17100.00 77 4 > 9000.00 435 3 > 6300.00 84 3 > 3962.88 334 2 > 5690.00 653 3 > 3736.00 233 2 > 2750.00 22 2 > 1316.00 345 2 > 4595.00 4595.00 3 > 5928.00 45 3 > 2645.70 0.00 2 > 2580.24 454 2 > 6547.34 6547.34 3 > 1615.68 5 2 > 194.06 55 1 > 184.80 6 1 > 82.94 44 1 > 16649.00 56 4 > 4500.00 74 3 > 1600.00 744 2 > > ================I might be tempted to take the following approach: If your data is a matrix, coerce it to a data frame first. Let's call that 'DF'.> str(DF)'data.frame': 44 obs. of 3 variables: $ x : num 15807 15753 12894 8427 5706 ... $ y : num 12.5 33.5 1.5 22.2 333 560 670 124 104 204 ... $ index: int 4 4 3 3 3 2 2 1 1 1 ... Now use split() to break up the data frame into a list of 4 sub-dataframes, based upon the index value. We can use scale() within a lapply() loop to center the 'x' and 'y' columns for each sub-dataframe: DF.ctr <- lapply(split(DF[, -3], DF$index), scale, scale = FALSE)> str(DF.ctr)List of 4 $ 1: num [1:8, 1:2] 138.5 58.2 38.2 -94.2 -131.1 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:8] "8" "9" "10" "14" ... .. ..$ : chr [1:2] "x" "y" ..- attr(*, "scaled:center")= Named num [1:2] 157.2 81.7 .. ..- attr(*, "names")= chr [1:2] "x" "y" $ 2: num [1:16, 1:2] 1469 1129 1727 -1291 -1371 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:16] "6" "7" "11" "12" ... .. ..$ : chr [1:2] "x" "y" ..- attr(*, "scaled:center")= Named num [1:2] 2513 230 .. ..- attr(*, "names")= chr [1:2] "x" "y" $ 3: num [1:13, 1:2] 5879 1413 -1308 3906 -870 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:13] "3" "4" "5" "20" ... .. ..$ : chr [1:2] "x" "y" ..- attr(*, "scaled:center")= Named num [1:2] 7014 1352 .. ..- attr(*, "names")= chr [1:2] "x" "y" $ 4: num [1:7, 1:2] -6262 -6317 -2393 17931 7431 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:7] "1" "2" "19" "23" ... .. ..$ : chr [1:2] "x" "y" ..- attr(*, "scaled:center")= Named num [1:2] 22069 43 .. ..- attr(*, "names")= chr [1:2] "x" "y" Now, create a new single DF comprised of the sub-dataframes from DF.ctr: DF.new <- do.call(rbind, DF.ctr) Define colnames: colnames(DF.new) <- c("x-mean", "y-mean")> str(DF.new)num [1:44, 1:2] 138.5 58.2 38.2 -94.2 -131.1 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:44] "8" "9" "10" "14" ... ..$ : chr [1:2] "x-mean" "y-mean" Now, use merge() to join DF and DF.new by the rownames: DF.final <- merge(DF, DF.new, by = "row.names")> DF.finalRow.names x y index x-mean y-mean 1 1 15807.24 12.50 4 -6262.12857 -30.498571 2 10 195.40 204.00 1 38.22750 122.350000 3 11 4240.21 22.40 2 1726.93188 -208.037500 4 12 1222.72 45.90 2 -1290.55812 -184.537500 5 13 1142.26 23.60 2 -1371.01812 -206.837500 6 14 63.00 90.10 1 -94.17250 8.450000 7 15 1216.00 82.40 2 -1297.27812 -148.037500 8 16 2769.60 111.00 2 256.32188 -119.437500 9 17 1790.46 34.70 2 -722.81812 -195.737500 10 18 26.10 26.10 1 -131.07250 -55.550000 11 19 19676.83 0.99 4 -2392.53857 -42.008571 12 2 15752.51 33.50 4 -6316.85857 -9.498571 13 20 10920.60 203.00 3 3906.26923 -1148.809231 14 21 6144.00 46.00 3 -870.33077 -1305.809231 15 22 4534.48 4534.48 3 -2479.85077 3182.670769 16 23 40000.00 65.00 4 17930.63143 22.001429 17 24 29500.00 56.00 4 7430.63143 13.001429 18 25 17100.00 77.00 4 -4969.36857 34.001429 19 26 9000.00 435.00 3 1985.66923 -916.809231 20 27 6300.00 84.00 3 -714.33077 -1267.809231 21 28 3962.88 334.00 2 1449.60188 103.562500 22 29 5690.00 653.00 3 -1324.33077 -698.809231 23 3 12893.76 1.50 3 5879.42923 -1350.309231 24 30 3736.00 233.00 2 1222.72188 2.562500 25 31 2750.00 22.00 2 236.72188 -208.437500 26 32 1316.00 345.00 2 -1197.27812 114.562500 27 33 4595.00 4595.00 3 -2419.33077 3243.190769 28 34 5928.00 45.00 3 -1086.33077 -1306.809231 29 35 2645.70 0.00 2 132.42188 -230.437500 30 36 2580.24 454.00 2 66.96187 223.562500 31 37 6547.34 6547.34 3 -466.99077 5195.530769 32 38 1615.68 5.00 2 -897.59812 -225.437500 33 39 194.06 55.00 1 36.88750 -26.650000 34 4 8426.88 22.20 3 1412.54923 -1329.609231 35 40 184.80 6.00 1 27.62750 -75.650000 36 41 82.94 44.00 1 -74.23250 -37.650000 37 42 16649.00 56.00 4 -5420.36857 13.001429 38 43 4500.00 74.00 3 -2514.33077 -1277.809231 39 44 1600.00 744.00 2 -913.27812 513.562500 40 5 5706.24 333.00 3 -1308.09077 -1018.809231 41 6 3982.08 560.00 2 1468.80188 329.562500 42 7 3642.62 670.00 2 1129.34188 439.562500 43 8 295.68 124.00 1 138.50750 42.350000 44 9 215.40 104.00 1 58.22750 22.350000 With respect to getting the correlation coefficient for each sub-group, you can do the following:> unlist(lapply(split(DF[, -3], DF$index), function(x) cor(x)[1, 2]))1 2 3 4 0.4468744 0.2619220 -0.3608070 0.3848641 See ?split, ?lapply, ?scale, ?do.call, ?rbind, ?unlist, ?merge and ?cor HTH, Marc Schwartz