Hi,
When i tried to merge two datasets (multiple to multiple merge), i met a
problem on how to stop a possible loop in the sampling arguments.
###My codes are as follows.###
data1<-matrix(data=c(1,1.2,1.3,"3/23/2004",1,1.5,2.3,"3/22/2004",2,0.2,3.3,"4/23/2004",3,1.5,1.3,"5/22/2004"),nrow=4,ncol=4,byrow=TRUE)
data1<-data.frame(data1);names(data1)<-c("areaid","x","y","date")
data2<-matrix(data=c(1,1.22,1.32,1, 1.53, 2.34,1, 1.21, 1.37,1, 1.52,
2.35,2, 0.21, 3.33,2, 0.23, 3.35,3, 1.57, 1.31,3, 1.59,
1.33),nrow=8,ncol=3,byrow=TRUE)
data2<-data.frame(data2);names(data2)<-c("areaid","x1","y1")
id<-unique(data1$areaid)
for (n in id) {
data1_n<-data1[data1$areaid==n,]
data2_n<-data2[data2$areaid==n,]
leg1_n<-length(data1_n$areaid)
leg2_n<-length(data2_n$areaid)
if (leg1_n=1)
merge(data1_n,data2_n,by.x="areaid",by.y="areaid") else
{
#leg1_n=1>=2
#leg1_1=2 and leg2_1=4 for areaid=1
set.seed(1000)
samp1_n<-sample(c(1:leg1_n),1, replace = FALSE)
data1_n_samp1<-data1_n[samp1_n,]
samp2_n<-sample(c(1:leg2_n),2, replace = FALSE)
data2_n_samp2<-data2_n[samp2_n,]
merge(data1_n_samp1,data2_n_samp2,by.x="areaid",by.y="areaid")
#need to continue to sample from the remained records in data1_n and
data2_n??????????
#some criteria to stop the sampling maybe needed?????????
}
}
#merge all the above dataset to get the final results.
Any ideas or suggestions on the problem? Thanks a lot.
###My question is explained in detail###
two datasets, data1 and data2.
######
data1<-matrix(data=c(1,1.2,1.3,"3/23/2004",1,1.5,2.3,"3/22/2004",2,0.2,3.3,"4/23/2004",3,1.5,1.3,"5/22/2004"),nrow=4,ncol=4,byrow=TRUE)
data1<-data.frame(data1)
names(data1)<-c("areaid","x","y","date")
data1
areaid x y date
1 1 1.2 1.3 3/23/2004
2 1 1.5 2.3 3/22/2004
3 2 0.2 3.3 4/23/2004
4 3 1.5 1.3 5/22/2004
######
data2<-matrix(data=c(1,1.22,1.32,1, 1.53, 2.34,1, 1.21, 1.37,1, 1.52,
2.35,2, 0.21, 3.33,2, 0.23, 3.35,3, 1.57, 1.31,3, 1.59,
1.33),nrow=8,ncol=3,byrow=TRUE)
data2<-data.frame(data2)
names(data2)<-c("areaid","x1","y1")
data2
areaid x1 y1
1 1 1.22 1.32
2 1 1.53 2.34
3 1 1.21 1.37
4 1 1.52 2.35
5 2 0.21 3.33
6 2 0.23 3.35
7 3 1.57 1.31
8 3 1.59 1.33
Explains the two data. You can treat data1 as case dataset and data2 as
control dataset,respectively.Note th number of recodes for data2 are 2 times
as that of data1 for each records,something like 1:2 matched case-control
study design. I hope to merge data1 and data2. Take areaid=1 as an
example.>From the two dataset, we can see that data1 has two points(x,y) in areaid=1,
and data2 has four points (x1,y1) in areaid=1. Each record in data1 will
have two matched records in data2. I want to randomly select 1/2 points of
areaid=1 in data2 to link the one record of areaid=1 in the data1, and the
other 1/2 points of areaid=1 in data2 to link the other one record of
areaid=1 in the data1.Actually, the number of records in the same areaid
will be over 2 in the actual dataset1. This is only an example to explain
the problem. For the cases of areaid=2 or 3,they are a little easier than
areaid=1 because there are only one value in data1.
The key or match variable is just areaid.
The final results are something like the following dataset.
areaid x1 y1 date x y
1 1.22 1.32 3/23/2004 1.2 1.3
1 1.53 2.34 3/22/2004 1.2 1.3
1 1.21 1.37 3/23/2004 1.5 2.3
1 1.52 2.35 3/22/2004 1.5 2.3
2 0.21 3.33 4/23/2004 0.2 3.3
2 0.23 3.35 4/23/2004 0.2 3.3
3 1.57 1.31 5/22/2004 1.5 1.3
3 1.59 1.33 5/22/2004 1.5 1.3
--
-----------------
Jane Chang
Queen's
[[alternative HTML version deleted]]