Thanoon,
You should still send the question to the R help list even when I helped
you with the code you are currently using. I will not always know the best
way or even how to proceed with some questions. As for to your question
with the code below.
Firstly, there is no 'phi' method for cor in base R. If you are using
it,
you must have neglected to include a package you are using. However, given
that the phi coefficient is equal to the pearson coefficient for
dichotomous data, you can use the 'pearson' method.
Secondly, with respect to your primary concern. In this case, we have
randomly chosen variables to correlate between two INDEPENDENT DATASETS
(i.e. different groups of samples). The idea with this code is that R1 and
R2 are datasets of 1000 samples and 10 variables. It would be miraculous
if they correlated when each had variables randomly assigned as
correlated. The code work correctly, the question now becomes if you want
to see correlations across variables for all samples (which this does for
each DATASET) or if you want two DATASETS to be correlated.
ords <- seq(0,1)
p <- 10
N <- 1000
percent_change <- 0.9
R1 <- as.data.frame(replicate(p, sample(ords, N, replace = T)))
R2 <- as.data.frame(replicate(p, sample(ords, N, replace = T)))
# phi is more appropriate for dichotomous data
cor(R1, method = "phi")
cor(R2, method = "phi")
# subset variable to have a stronger correlation
v1 <- R1[,1, drop = FALSE]
v1 <- R2[,1, drop = FALSE]
# randomly choose which rows to retain
keep <- sample(as.numeric(rownames(v1)), size = percent_change*nrow(v1))
change <- as.numeric(rownames(v1)[-keep])
# randomly choose new values for changing
new.change <- sample(ords, ((1-percent_change)*N)+1, replace = T)
# replace values in copy of original column
v1.samp <- v1
v1.samp[change,] <- new.change
# closer correlation
cor(v1, v1.samp, method = "phi")
# set correlated column as one of your other columns
R1[,2] <- v1.samp
R2[,2] <- v1.samp
R1
R2
On Thu, Jul 31, 2014 at 7:29 AM, thanoon younis
<thanoon.younis80@gmail.com>
wrote:
> dear Dr. Charles
> i have a problem with the following R - program in simulation data with 2
> different samples and with high correlation between variables in each
> sample so when i applied the program i got on a results but without
> correlation between each sample.
> i appreciate your help and your time
> i did not send this code to R- help because you helped me before to write
> it .
>
> many thanks to you
>
> Thanoon
>
--
Dr. Charles Determan, PhD
Integrated Biosciences
[[alternative HTML version deleted]]