thr3ads.net - R help - [R] Urgent Help needed [Aug 2007]

If this information is useful, please help other people find it:
Share via:

AbouEl-Makarim Aboueissa

2007-Aug-16 16:33 UTC

[R] Urgent Help needed

Dear All:

Urgent help is needed.


I have a data set in matrix format  of three columns: X, Y and index of four
groups (1,2,3,4). What I need to do is the following;

1- How I can subtract the sample mean of each group indexed 1,2,3,4 from the 
     corresponding data values of this group and create new columns say X-sample
mean
      and Y-sample mean? I tried to use the "tapply" but I have some
difficulties to restore the new data


2- How I can use the ?tapply? if possible or any other R-function to find the
correlation
     coefficient between the X and Y columns for each group indexed 1,2,3,4.?
Could not use the "tapply".


I attached part of the data as txt file.


Thank you so much for your attention to this matter, and I look forward to hear
from you soon.

Regards,

Abou


Data:
===x	y	index
15807.24 	12.5 	4
15752.51 	33.5 	4
12893.76 	01.5 	3
8426.88 	22.2 	3
5706.24 	333 	3
3982.08 	560 	2
3642.62 	670 	2
295.68 		124 	1
215.40 		104 	1
195.40 		204 	1
4240.21 	22.4 	2
1222.72 	45.9 	2
1142.26 	23.6 	2
63.00 		90.1 	1
1216.00 	82.4 	2
2769.60 	111 	2
1790.46 	34.7 	2
26.10 		26.10 	1
19676.83 	0.99 	4
10920.60 	203 	3
6144.00 	46 	3
4534.48 	4534.48 3
40000.00 	65 	4
29500.00 	56 	4
17100.00 	77 	4
9000.00 	435 	3
6300.00 	84 	3
3962.88 	334 	2
5690.00 	653 	3
3736.00 	233 	2
2750.00 	22 	2
1316.00 	345 	2
4595.00 	4595.00 3
5928.00 	45 	3
2645.70 	0.00 	2
2580.24 	454 	2
6547.34 	6547.34 3
1615.68 	5 	2
194.06 		55 	1
184.80 		6 	1
82.94 		44 	1
16649.00 	56 	4
4500.00 	74 	3
1600.00 	744 	2

================


=========================AbouEl-Makarim Aboueissa, Ph.D.
Assistant Professor of Statistics
Department of Mathematics & Statistics
University of Southern Maine
96 Falmouth Street
P.O. Box 9300
Portland, ME 04104-9300

Tel: (207) 228-8389
Email: aaboueissa at usm.maine.edu
          aboueiss at yahoo.com
Office: 301C Payson Smith

-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: datatest.txt
Url:
https://stat.ethz.ch/pipermail/r-help/attachments/20070816/34988702/attachment.txt

Weiwei Shi

2007-Aug-16 17:27 UTC

head link

[R] Urgent Help needed

try this:

t0 = read.table("datatest.txt", header=T)
X.mean = ave(t0[,1], as.factor(t0[,3]))

you do the rest of Y.mean and make them into a data.fame or whatever.

HTH,

Weiwei

On 8/16/07, AbouEl-Makarim Aboueissa <aaboueissa at usm.maine.edu>
wrote:> Dear All:
>
> Urgent help is needed.
>
>
> I have a data set in matrix format  of three columns: X, Y and index of
four groups (1,2,3,4). What I need to do is the following;
>
> 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from
the
>      corresponding data values of this group and create new columns say
X-sample mean
>       and Y-sample mean? I tried to use the "tapply" but I have
some difficulties to restore the new data
>
>
> 2- How I can use the "tapply" if possible or any other R-function
to find the correlation
>      coefficient between the X and Y columns for each group indexed
1,2,3,4.? Could not use the "tapply".
>
>
> I attached part of the data as txt file.
>
>
> Thank you so much for your attention to this matter, and I look forward to
hear from you soon.
>
> Regards,
>
> Abou
>
>
> Data:
> ===> x       y       index
> 15807.24        12.5    4
> 15752.51        33.5    4
> 12893.76        01.5    3
> 8426.88         22.2    3
> 5706.24         333     3
> 3982.08         560     2
> 3642.62         670     2
> 295.68          124     1
> 215.40          104     1
> 195.40          204     1
> 4240.21         22.4    2
> 1222.72         45.9    2
> 1142.26         23.6    2
> 63.00           90.1    1
> 1216.00         82.4    2
> 2769.60         111     2
> 1790.46         34.7    2
> 26.10           26.10   1
> 19676.83        0.99    4
> 10920.60        203     3
> 6144.00         46      3
> 4534.48         4534.48 3
> 40000.00        65      4
> 29500.00        56      4
> 17100.00        77      4
> 9000.00         435     3
> 6300.00         84      3
> 3962.88         334     2
> 5690.00         653     3
> 3736.00         233     2
> 2750.00         22      2
> 1316.00         345     2
> 4595.00         4595.00 3
> 5928.00         45      3
> 2645.70         0.00    2
> 2580.24         454     2
> 6547.34         6547.34 3
> 1615.68         5       2
> 194.06          55      1
> 184.80          6       1
> 82.94           44      1
> 16649.00        56      4
> 4500.00         74      3
> 1600.00         744     2
>
> ================>
>
>
> =========================> AbouEl-Makarim Aboueissa, Ph.D.
> Assistant Professor of Statistics
> Department of Mathematics & Statistics
> University of Southern Maine
> 96 Falmouth Street
> P.O. Box 9300
> Portland, ME 04104-9300
>
> Tel: (207) 228-8389
> Email: aaboueissa at usm.maine.edu
>           aboueiss at yahoo.com
> Office: 301C Payson Smith
>
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>

-- 
Weiwei Shi, Ph.D
Research Scientist
GeneGO, Inc.

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

Henrique Dallazuanna

2007-Aug-16 18:05 UTC

head link

[R] Urgent Help needed

For the 2nd item, perhaps:

by(df[,1:2], df$index, FUN=cor)

where df is your data.frame.

-- 
Henrique Dallazuanna
Curitiba-Paraná-Brasil
25° 25' 40" S 49° 16' 22" O

On 16/08/07, AbouEl-Makarim Aboueissa <aaboueissa@usm.maine.edu>
wrote:>
> Dear All:
>
> Urgent help is needed.
>
>
> I have a data set in matrix format  of three columns: X, Y and index of
> four groups (1,2,3,4). What I need to do is the following;
>
> 1- How I can subtract the sample mean of each group indexed 1,2,3,4 from
> the
>      corresponding data values of this group and create new columns say
> X-sample mean
>       and Y-sample mean? I tried to use the "tapply" but I have
some
> difficulties to restore the new data
>
>
> 2- How I can use the "tapply" if possible or any other R-function
to find
> the correlation
>      coefficient between the X and Y columns for each group indexed
> 1,2,3,4.? Could not use the "tapply".
>
>
> I attached part of the data as txt file.
>
>
> Thank you so much for your attention to this matter, and I look forward to
> hear from you soon.
>
> Regards,
>
> Abou
>
>
> Data:
> ===> x       y       index
> 15807.24        12.5    4
> 15752.51        33.5    4
> 12893.76        01.5    3
> 8426.88         22.2    3
> 5706.24         333     3
> 3982.08         560     2
> 3642.62         670     2
> 295.68          124     1
> 215.40          104     1
> 195.40          204     1
> 4240.21         22.4    2
> 1222.72         45.9    2
> 1142.26         23.6    2
> 63.00           90.1    1
> 1216.00         82.4    2
> 2769.60         111     2
> 1790.46         34.7    2
> 26.10           26.10   1
> 19676.83        0.99    4
> 10920.60        203     3
> 6144.00         46      3
> 4534.48         4534.48 3
> 40000.00        65      4
> 29500.00        56      4
> 17100.00        77      4
> 9000.00         435     3
> 6300.00         84      3
> 3962.88         334     2
> 5690.00         653     3
> 3736.00         233     2
> 2750.00         22      2
> 1316.00         345     2
> 4595.00         4595.00 3
> 5928.00         45      3
> 2645.70         0.00    2
> 2580.24         454     2
> 6547.34         6547.34 3
> 1615.68         5       2
> 194.06          55      1
> 184.80          6       1
> 82.94           44      1
> 16649.00        56      4
> 4500.00         74      3
> 1600.00         744     2
>
> ================>
>
>
> =========================> AbouEl-Makarim Aboueissa, Ph.D.
> Assistant Professor of Statistics
> Department of Mathematics & Statistics
> University of Southern Maine
> 96 Falmouth Street
> P.O. Box 9300
> Portland, ME 04104-9300
>
> Tel: (207) 228-8389
> Email: aaboueissa@usm.maine.edu
>           aboueiss@yahoo.com
> Office: 301C Payson Smith
>
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>
>
	[[alternative HTML version deleted]]

Marc Schwartz

2007-Aug-16 19:02 UTC

head link

[R] Urgent Help needed

On Thu, 2007-08-16 at 12:33 -0400, AbouEl-Makarim Aboueissa
wrote:> Dear All:
> 
> Urgent help is needed.
> 
> 
> I have a data set in matrix format  of three columns: X, Y and index
> of four groups (1,2,3,4). What I need to do is the following;
> 
> 1- How I can subtract the sample mean of each group indexed 1,2,3,4
> from the 
>      corresponding data values of this group and create new columns
> say X-sample mean 
>       and Y-sample mean? I tried to use the "tapply" but I have
some
> difficulties to restore the new data
> 
> 
> 2- How I can use the ?tapply? if possible or any other R-function to
> find the correlation 
>      coefficient between the X and Y columns for each group indexed
> 1,2,3,4.? Could not use the "tapply".
> 
> 
> I attached part of the data as txt file.
> 
> 
> Thank you so much for your attention to this matter, and I look
> forward to hear from you soon.
> 
> Regards,
> 
> Abou
> 
> 
> Data:
> ===> x	y	index
> 15807.24 	12.5 	4
> 15752.51 	33.5 	4
> 12893.76 	01.5 	3
> 8426.88 	22.2 	3
> 5706.24 	333 	3
> 3982.08 	560 	2
> 3642.62 	670 	2
> 295.68 		124 	1
> 215.40 		104 	1
> 195.40 		204 	1
> 4240.21 	22.4 	2
> 1222.72 	45.9 	2
> 1142.26 	23.6 	2
> 63.00 		90.1 	1
> 1216.00 	82.4 	2
> 2769.60 	111 	2
> 1790.46 	34.7 	2
> 26.10 		26.10 	1
> 19676.83 	0.99 	4
> 10920.60 	203 	3
> 6144.00 	46 	3
> 4534.48 	4534.48 3
> 40000.00 	65 	4
> 29500.00 	56 	4
> 17100.00 	77 	4
> 9000.00 	435 	3
> 6300.00 	84 	3
> 3962.88 	334 	2
> 5690.00 	653 	3
> 3736.00 	233 	2
> 2750.00 	22 	2
> 1316.00 	345 	2
> 4595.00 	4595.00 3
> 5928.00 	45 	3
> 2645.70 	0.00 	2
> 2580.24 	454 	2
> 6547.34 	6547.34 3
> 1615.68 	5 	2
> 194.06 		55 	1
> 184.80 		6 	1
> 82.94 		44 	1
> 16649.00 	56 	4
> 4500.00 	74 	3
> 1600.00 	744 	2
> 
> ================
I might be tempted to take the following approach:

If your data is a matrix, coerce it to a data frame first. Let's call
that 'DF'.
> str(DF)'data.frame':   44 obs. of  3 variables:
 $ x    : num  15807 15753 12894  8427  5706 ...
 $ y    : num  12.5 33.5 1.5 22.2 333 560 670 124 104 204 ...
 $ index: int  4 4 3 3 3 2 2 1 1 1 ...


Now use split() to break up the data frame into a list of 4
sub-dataframes, based upon the index value.  We can use scale() within a
lapply() loop to center the 'x' and 'y' columns for each
sub-dataframe:


DF.ctr <- lapply(split(DF[, -3], DF$index), scale, scale = FALSE)

> str(DF.ctr)List of 4
 $ 1: num [1:8, 1:2]  138.5   58.2   38.2  -94.2 -131.1 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:8] "8" "9" "10" "14"
...
  .. ..$ : chr [1:2] "x" "y"
  ..- attr(*, "scaled:center")= Named num [1:2] 157.2  81.7
  .. ..- attr(*, "names")= chr [1:2] "x" "y"
 $ 2: num [1:16, 1:2]  1469  1129  1727 -1291 -1371 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:16] "6" "7" "11" "12"
...
  .. ..$ : chr [1:2] "x" "y"
  ..- attr(*, "scaled:center")= Named num [1:2] 2513  230
  .. ..- attr(*, "names")= chr [1:2] "x" "y"
 $ 3: num [1:13, 1:2]  5879  1413 -1308  3906  -870 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:13] "3" "4" "5" "20"
...
  .. ..$ : chr [1:2] "x" "y"
  ..- attr(*, "scaled:center")= Named num [1:2] 7014 1352
  .. ..- attr(*, "names")= chr [1:2] "x" "y"
 $ 4: num [1:7, 1:2] -6262 -6317 -2393 17931  7431 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:7] "1" "2" "19" "23"
...
  .. ..$ : chr [1:2] "x" "y"
  ..- attr(*, "scaled:center")= Named num [1:2] 22069    43
  .. ..- attr(*, "names")= chr [1:2] "x" "y"


Now, create a new single DF comprised of the sub-dataframes from DF.ctr:

DF.new <- do.call(rbind, DF.ctr)


Define colnames:

colnames(DF.new) <- c("x-mean", "y-mean")

> str(DF.new) num [1:44, 1:2]  138.5   58.2   38.2  -94.2 -131.1 ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:44] "8" "9" "10" "14" ...
  ..$ : chr [1:2] "x-mean" "y-mean"


Now, use merge() to join DF and DF.new by the rownames:

DF.final <- merge(DF, DF.new, by = "row.names")
> DF.final   Row.names        x       y index      x-mean       y-mean
1          1 15807.24   12.50     4 -6262.12857   -30.498571
2         10   195.40  204.00     1    38.22750   122.350000
3         11  4240.21   22.40     2  1726.93188  -208.037500
4         12  1222.72   45.90     2 -1290.55812  -184.537500
5         13  1142.26   23.60     2 -1371.01812  -206.837500
6         14    63.00   90.10     1   -94.17250     8.450000
7         15  1216.00   82.40     2 -1297.27812  -148.037500
8         16  2769.60  111.00     2   256.32188  -119.437500
9         17  1790.46   34.70     2  -722.81812  -195.737500
10        18    26.10   26.10     1  -131.07250   -55.550000
11        19 19676.83    0.99     4 -2392.53857   -42.008571
12         2 15752.51   33.50     4 -6316.85857    -9.498571
13        20 10920.60  203.00     3  3906.26923 -1148.809231
14        21  6144.00   46.00     3  -870.33077 -1305.809231
15        22  4534.48 4534.48     3 -2479.85077  3182.670769
16        23 40000.00   65.00     4 17930.63143    22.001429
17        24 29500.00   56.00     4  7430.63143    13.001429
18        25 17100.00   77.00     4 -4969.36857    34.001429
19        26  9000.00  435.00     3  1985.66923  -916.809231
20        27  6300.00   84.00     3  -714.33077 -1267.809231
21        28  3962.88  334.00     2  1449.60188   103.562500
22        29  5690.00  653.00     3 -1324.33077  -698.809231
23         3 12893.76    1.50     3  5879.42923 -1350.309231
24        30  3736.00  233.00     2  1222.72188     2.562500
25        31  2750.00   22.00     2   236.72188  -208.437500
26        32  1316.00  345.00     2 -1197.27812   114.562500
27        33  4595.00 4595.00     3 -2419.33077  3243.190769
28        34  5928.00   45.00     3 -1086.33077 -1306.809231
29        35  2645.70    0.00     2   132.42188  -230.437500
30        36  2580.24  454.00     2    66.96187   223.562500
31        37  6547.34 6547.34     3  -466.99077  5195.530769
32        38  1615.68    5.00     2  -897.59812  -225.437500
33        39   194.06   55.00     1    36.88750   -26.650000
34         4  8426.88   22.20     3  1412.54923 -1329.609231
35        40   184.80    6.00     1    27.62750   -75.650000
36        41    82.94   44.00     1   -74.23250   -37.650000
37        42 16649.00   56.00     4 -5420.36857    13.001429
38        43  4500.00   74.00     3 -2514.33077 -1277.809231
39        44  1600.00  744.00     2  -913.27812   513.562500
40         5  5706.24  333.00     3 -1308.09077 -1018.809231
41         6  3982.08  560.00     2  1468.80188   329.562500
42         7  3642.62  670.00     2  1129.34188   439.562500
43         8   295.68  124.00     1   138.50750    42.350000
44         9   215.40  104.00     1    58.22750    22.350000



With respect to getting the correlation coefficient for each sub-group,
you can do the following:
> unlist(lapply(split(DF[, -3], DF$index), function(x) cor(x)[1, 2]))         1          2          3          4 
 0.4468744  0.2619220 -0.3608070  0.3848641


See ?split, ?lapply, ?scale, ?do.call, ?rbind, ?unlist, ?merge and ?cor

HTH,

Marc Schwartz

R help - Aug 2007 - Urgent Help needed

[R] Urgent Help needed

[R] Urgent Help needed

[R] Urgent Help needed

[R] Urgent Help needed