Hi,
My problem maybe a little bit complicated, so forgive me if the following
words are too much.
#date set
a0<-matrix(c(1.1,1.3,1.1,1.3,1.3, 2.0,1.8,2.0,1.8,1.8,
"12/01/2008","05/20/2007","12/06/2008","05/10/2007","05/06/2007",
"N","N","A","C","A",
1,2,3,4,5),ncol=5,byrow=FALSE)
a0<-data.frame(a0);
colnames(a0)<-c("x","y","date","prior","var5")
a0$date<-as.Date(a0$date,format="%m/%d/%Y")> a0
x y date prior var5
1 1.1 2 2008-12-01 N 1
2 1.3 1.8 2007-05-20 N 2
3 1.1 2 2008-12-06 A 3
4 1.3 1.8 2007-05-10 C 4
5 1.3 1.8 2007-05-06 A 5
I want to select the observations based on the values of x and y, but also
need to consider the values of "prior". The rough ideas are as
follows.
# Check whether there are observations which have same x and y values,e.g.1
and 3. If same, their date difference need to be checked further. If their
date difference is <=8days, we only need to keep the earliest observarion;
#During this selection, variable of "prior" is also needed to be
considered. For "prior", the priority is C>A>N. For the same
several
observations, if the earliest observation has a value of "N" and other
later
observations have "A" or "C", then we also need to replace
the earliest
observation's "N" into "C" or "A",which
depends the priority C>A>N.
#If their date difference is >8days, donot need to conduct the above
manipulation. Keep them.
#And also keep all the other observations whose donot have the same x and y
values.
#So the final result should be
x y date prior var5
1 1.1 2 2008-12-01 A 1
5 1.3 1.8 2007-05-06 C 5
2 1.3 1.8 2007-05-20 N 2
#obs1 and obs3 have same x and y; Their date difference is 5, so we will
select the earlist obs1 But we notice that its prior value is N, while
another is A, so we also need to replace N using A.
#obs2,obs4 and obs5 have same x and y; Obs4 and obs5 has a date difference
4<=8days,
so earlies obs 5 will be selected, but also we need to replace A using C
because of the priority.If the prior value for obs4 is N, then we donot need
to replace the A of obs5 because A is prior to N.
I hope i have made it clear for this problem. It seems to be complex. If
the dataset is small, we can do it manually. But it will be impossible for a
large dataset. Anybody has some ideas on this?
Any suggestions or help are appreciated.
--
-----------------
Jane Chang
Queen's
[[alternative HTML version deleted]]