Dear All:
Urgent help is needed.
I have a data set in matrix format of three columns: X, Y and index of four
groups (1,2,3,4). What I need to do is the following;
1- How I can subtract the sample mean of each group indexed 1,2,3,4 from the
corresponding data values of this group and create new columns say X-sample
mean
and Y-sample mean? I tried to use the "tapply" but I have some
difficulties to restore the new data
2- How I can use the ?tapply? if possible or any other R-function to find the
correlation
coefficient between the X and Y columns for each group indexed 1,2,3,4.?
Could not use the "tapply".
I attached part of the data as txt file.
Thank you so much for your attention to this matter, and I look forward to hear
from you soon.
Regards,
Abou
Data:
===x y index
15807.24 12.5 4
15752.51 33.5 4
12893.76 01.5 3
8426.88 22.2 3
5706.24 333 3
3982.08 560 2
3642.62 670 2
295.68 124 1
215.40 104 1
195.40 204 1
4240.21 22.4 2
1222.72 45.9 2
1142.26 23.6 2
63.00 90.1 1
1216.00 82.4 2
2769.60 111 2
1790.46 34.7 2
26.10 26.10 1
19676.83 0.99 4
10920.60 203 3
6144.00 46 3
4534.48 4534.48 3
40000.00 65 4
29500.00 56 4
17100.00 77 4
9000.00 435 3
6300.00 84 3
3962.88 334 2
5690.00 653 3
3736.00 233 2
2750.00 22 2
1316.00 345 2
4595.00 4595.00 3
5928.00 45 3
2645.70 0.00 2
2580.24 454 2
6547.34 6547.34 3
1615.68 5 2
194.06 55 1
184.80 6 1
82.94 44 1
16649.00 56 4
4500.00 74 3
1600.00 744 2
================
=========================AbouEl-Makarim Aboueissa, Ph.D.
Assistant Professor of Statistics
Department of Mathematics & Statistics
University of Southern Maine
96 Falmouth Street
P.O. Box 9300
Portland, ME 04104-9300
Tel: (207) 228-8389
Email: aaboueissa at usm.maine.edu
aboueiss at yahoo.com
Office: 301C Payson Smith
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: datatest.txt
Url:
https://stat.ethz.ch/pipermail/r-help/attachments/20070816/34988702/attachment.txt
try this:
t0 = read.table("datatest.txt", header=T)
X.mean = ave(t0[,1], as.factor(t0[,3]))
you do the rest of Y.mean and make them into a data.fame or whatever.
HTH,
Weiwei
On 8/16/07, AbouEl-Makarim Aboueissa <aaboueissa at usm.maine.edu>
wrote:> Dear All:
>
> Urgent help is needed.
>
>
> I have a data set in matrix format of three columns: X, Y and index of
four groups (1,2,3,4). What I need to do is the following;
>
> 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from
the
> corresponding data values of this group and create new columns say
X-sample mean
> and Y-sample mean? I tried to use the "tapply" but I have
some difficulties to restore the new data
>
>
> 2- How I can use the "tapply" if possible or any other R-function
to find the correlation
> coefficient between the X and Y columns for each group indexed
1,2,3,4.? Could not use the "tapply".
>
>
> I attached part of the data as txt file.
>
>
> Thank you so much for your attention to this matter, and I look forward to
hear from you soon.
>
> Regards,
>
> Abou
>
>
> Data:
> ===> x y index
> 15807.24 12.5 4
> 15752.51 33.5 4
> 12893.76 01.5 3
> 8426.88 22.2 3
> 5706.24 333 3
> 3982.08 560 2
> 3642.62 670 2
> 295.68 124 1
> 215.40 104 1
> 195.40 204 1
> 4240.21 22.4 2
> 1222.72 45.9 2
> 1142.26 23.6 2
> 63.00 90.1 1
> 1216.00 82.4 2
> 2769.60 111 2
> 1790.46 34.7 2
> 26.10 26.10 1
> 19676.83 0.99 4
> 10920.60 203 3
> 6144.00 46 3
> 4534.48 4534.48 3
> 40000.00 65 4
> 29500.00 56 4
> 17100.00 77 4
> 9000.00 435 3
> 6300.00 84 3
> 3962.88 334 2
> 5690.00 653 3
> 3736.00 233 2
> 2750.00 22 2
> 1316.00 345 2
> 4595.00 4595.00 3
> 5928.00 45 3
> 2645.70 0.00 2
> 2580.24 454 2
> 6547.34 6547.34 3
> 1615.68 5 2
> 194.06 55 1
> 184.80 6 1
> 82.94 44 1
> 16649.00 56 4
> 4500.00 74 3
> 1600.00 744 2
>
> ================>
>
>
> =========================> AbouEl-Makarim Aboueissa, Ph.D.
> Assistant Professor of Statistics
> Department of Mathematics & Statistics
> University of Southern Maine
> 96 Falmouth Street
> P.O. Box 9300
> Portland, ME 04104-9300
>
> Tel: (207) 228-8389
> Email: aaboueissa at usm.maine.edu
> aboueiss at yahoo.com
> Office: 301C Payson Smith
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
--
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.
"Did you always know?"
"No, I did not. But I believed..."
---Matrix III
For the 2nd item, perhaps: by(df[,1:2], df$index, FUN=cor) where df is your data.frame. -- Henrique Dallazuanna Curitiba-Paraná-Brasil 25° 25' 40" S 49° 16' 22" O On 16/08/07, AbouEl-Makarim Aboueissa <aaboueissa@usm.maine.edu> wrote:> > Dear All: > > Urgent help is needed. > > > I have a data set in matrix format of three columns: X, Y and index of > four groups (1,2,3,4). What I need to do is the following; > > 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from > the > corresponding data values of this group and create new columns say > X-sample mean > and Y-sample mean? I tried to use the "tapply" but I have some > difficulties to restore the new data > > > 2- How I can use the "tapply" if possible or any other R-function to find > the correlation > coefficient between the X and Y columns for each group indexed > 1,2,3,4.? Could not use the "tapply". > > > I attached part of the data as txt file. > > > Thank you so much for your attention to this matter, and I look forward to > hear from you soon. > > Regards, > > Abou > > > Data: > ===> x y index > 15807.24 12.5 4 > 15752.51 33.5 4 > 12893.76 01.5 3 > 8426.88 22.2 3 > 5706.24 333 3 > 3982.08 560 2 > 3642.62 670 2 > 295.68 124 1 > 215.40 104 1 > 195.40 204 1 > 4240.21 22.4 2 > 1222.72 45.9 2 > 1142.26 23.6 2 > 63.00 90.1 1 > 1216.00 82.4 2 > 2769.60 111 2 > 1790.46 34.7 2 > 26.10 26.10 1 > 19676.83 0.99 4 > 10920.60 203 3 > 6144.00 46 3 > 4534.48 4534.48 3 > 40000.00 65 4 > 29500.00 56 4 > 17100.00 77 4 > 9000.00 435 3 > 6300.00 84 3 > 3962.88 334 2 > 5690.00 653 3 > 3736.00 233 2 > 2750.00 22 2 > 1316.00 345 2 > 4595.00 4595.00 3 > 5928.00 45 3 > 2645.70 0.00 2 > 2580.24 454 2 > 6547.34 6547.34 3 > 1615.68 5 2 > 194.06 55 1 > 184.80 6 1 > 82.94 44 1 > 16649.00 56 4 > 4500.00 74 3 > 1600.00 744 2 > > ================> > > > =========================> AbouEl-Makarim Aboueissa, Ph.D. > Assistant Professor of Statistics > Department of Mathematics & Statistics > University of Southern Maine > 96 Falmouth Street > P.O. Box 9300 > Portland, ME 04104-9300 > > Tel: (207) 228-8389 > Email: aaboueissa@usm.maine.edu > aboueiss@yahoo.com > Office: 301C Payson Smith > > > ______________________________________________ > R-help@stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > >[[alternative HTML version deleted]]
On Thu, 2007-08-16 at 12:33 -0400, AbouEl-Makarim Aboueissa wrote:> Dear All: > > Urgent help is needed. > > > I have a data set in matrix format of three columns: X, Y and index > of four groups (1,2,3,4). What I need to do is the following; > > 1- How I can subtract the sample mean of each group indexed 1,2,3,4 > from the > corresponding data values of this group and create new columns > say X-sample mean > and Y-sample mean? I tried to use the "tapply" but I have some > difficulties to restore the new data > > > 2- How I can use the ?tapply? if possible or any other R-function to > find the correlation > coefficient between the X and Y columns for each group indexed > 1,2,3,4.? Could not use the "tapply". > > > I attached part of the data as txt file. > > > Thank you so much for your attention to this matter, and I look > forward to hear from you soon. > > Regards, > > Abou > > > Data: > ===> x y index > 15807.24 12.5 4 > 15752.51 33.5 4 > 12893.76 01.5 3 > 8426.88 22.2 3 > 5706.24 333 3 > 3982.08 560 2 > 3642.62 670 2 > 295.68 124 1 > 215.40 104 1 > 195.40 204 1 > 4240.21 22.4 2 > 1222.72 45.9 2 > 1142.26 23.6 2 > 63.00 90.1 1 > 1216.00 82.4 2 > 2769.60 111 2 > 1790.46 34.7 2 > 26.10 26.10 1 > 19676.83 0.99 4 > 10920.60 203 3 > 6144.00 46 3 > 4534.48 4534.48 3 > 40000.00 65 4 > 29500.00 56 4 > 17100.00 77 4 > 9000.00 435 3 > 6300.00 84 3 > 3962.88 334 2 > 5690.00 653 3 > 3736.00 233 2 > 2750.00 22 2 > 1316.00 345 2 > 4595.00 4595.00 3 > 5928.00 45 3 > 2645.70 0.00 2 > 2580.24 454 2 > 6547.34 6547.34 3 > 1615.68 5 2 > 194.06 55 1 > 184.80 6 1 > 82.94 44 1 > 16649.00 56 4 > 4500.00 74 3 > 1600.00 744 2 > > ================I might be tempted to take the following approach: If your data is a matrix, coerce it to a data frame first. Let's call that 'DF'.> str(DF)'data.frame': 44 obs. of 3 variables: $ x : num 15807 15753 12894 8427 5706 ... $ y : num 12.5 33.5 1.5 22.2 333 560 670 124 104 204 ... $ index: int 4 4 3 3 3 2 2 1 1 1 ... Now use split() to break up the data frame into a list of 4 sub-dataframes, based upon the index value. We can use scale() within a lapply() loop to center the 'x' and 'y' columns for each sub-dataframe: DF.ctr <- lapply(split(DF[, -3], DF$index), scale, scale = FALSE)> str(DF.ctr)List of 4 $ 1: num [1:8, 1:2] 138.5 58.2 38.2 -94.2 -131.1 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:8] "8" "9" "10" "14" ... .. ..$ : chr [1:2] "x" "y" ..- attr(*, "scaled:center")= Named num [1:2] 157.2 81.7 .. ..- attr(*, "names")= chr [1:2] "x" "y" $ 2: num [1:16, 1:2] 1469 1129 1727 -1291 -1371 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:16] "6" "7" "11" "12" ... .. ..$ : chr [1:2] "x" "y" ..- attr(*, "scaled:center")= Named num [1:2] 2513 230 .. ..- attr(*, "names")= chr [1:2] "x" "y" $ 3: num [1:13, 1:2] 5879 1413 -1308 3906 -870 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:13] "3" "4" "5" "20" ... .. ..$ : chr [1:2] "x" "y" ..- attr(*, "scaled:center")= Named num [1:2] 7014 1352 .. ..- attr(*, "names")= chr [1:2] "x" "y" $ 4: num [1:7, 1:2] -6262 -6317 -2393 17931 7431 ... ..- attr(*, "dimnames")=List of 2 .. ..$ : chr [1:7] "1" "2" "19" "23" ... .. ..$ : chr [1:2] "x" "y" ..- attr(*, "scaled:center")= Named num [1:2] 22069 43 .. ..- attr(*, "names")= chr [1:2] "x" "y" Now, create a new single DF comprised of the sub-dataframes from DF.ctr: DF.new <- do.call(rbind, DF.ctr) Define colnames: colnames(DF.new) <- c("x-mean", "y-mean")> str(DF.new)num [1:44, 1:2] 138.5 58.2 38.2 -94.2 -131.1 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:44] "8" "9" "10" "14" ... ..$ : chr [1:2] "x-mean" "y-mean" Now, use merge() to join DF and DF.new by the rownames: DF.final <- merge(DF, DF.new, by = "row.names")> DF.finalRow.names x y index x-mean y-mean 1 1 15807.24 12.50 4 -6262.12857 -30.498571 2 10 195.40 204.00 1 38.22750 122.350000 3 11 4240.21 22.40 2 1726.93188 -208.037500 4 12 1222.72 45.90 2 -1290.55812 -184.537500 5 13 1142.26 23.60 2 -1371.01812 -206.837500 6 14 63.00 90.10 1 -94.17250 8.450000 7 15 1216.00 82.40 2 -1297.27812 -148.037500 8 16 2769.60 111.00 2 256.32188 -119.437500 9 17 1790.46 34.70 2 -722.81812 -195.737500 10 18 26.10 26.10 1 -131.07250 -55.550000 11 19 19676.83 0.99 4 -2392.53857 -42.008571 12 2 15752.51 33.50 4 -6316.85857 -9.498571 13 20 10920.60 203.00 3 3906.26923 -1148.809231 14 21 6144.00 46.00 3 -870.33077 -1305.809231 15 22 4534.48 4534.48 3 -2479.85077 3182.670769 16 23 40000.00 65.00 4 17930.63143 22.001429 17 24 29500.00 56.00 4 7430.63143 13.001429 18 25 17100.00 77.00 4 -4969.36857 34.001429 19 26 9000.00 435.00 3 1985.66923 -916.809231 20 27 6300.00 84.00 3 -714.33077 -1267.809231 21 28 3962.88 334.00 2 1449.60188 103.562500 22 29 5690.00 653.00 3 -1324.33077 -698.809231 23 3 12893.76 1.50 3 5879.42923 -1350.309231 24 30 3736.00 233.00 2 1222.72188 2.562500 25 31 2750.00 22.00 2 236.72188 -208.437500 26 32 1316.00 345.00 2 -1197.27812 114.562500 27 33 4595.00 4595.00 3 -2419.33077 3243.190769 28 34 5928.00 45.00 3 -1086.33077 -1306.809231 29 35 2645.70 0.00 2 132.42188 -230.437500 30 36 2580.24 454.00 2 66.96187 223.562500 31 37 6547.34 6547.34 3 -466.99077 5195.530769 32 38 1615.68 5.00 2 -897.59812 -225.437500 33 39 194.06 55.00 1 36.88750 -26.650000 34 4 8426.88 22.20 3 1412.54923 -1329.609231 35 40 184.80 6.00 1 27.62750 -75.650000 36 41 82.94 44.00 1 -74.23250 -37.650000 37 42 16649.00 56.00 4 -5420.36857 13.001429 38 43 4500.00 74.00 3 -2514.33077 -1277.809231 39 44 1600.00 744.00 2 -913.27812 513.562500 40 5 5706.24 333.00 3 -1308.09077 -1018.809231 41 6 3982.08 560.00 2 1468.80188 329.562500 42 7 3642.62 670.00 2 1129.34188 439.562500 43 8 295.68 124.00 1 138.50750 42.350000 44 9 215.40 104.00 1 58.22750 22.350000 With respect to getting the correlation coefficient for each sub-group, you can do the following:> unlist(lapply(split(DF[, -3], DF$index), function(x) cor(x)[1, 2]))1 2 3 4 0.4468744 0.2619220 -0.3608070 0.3848641 See ?split, ?lapply, ?scale, ?do.call, ?rbind, ?unlist, ?merge and ?cor HTH, Marc Schwartz