Hi, It is not that clear. If VAR1 is a match between columns AB001A, AB0002A, VAR2? between AB001A, AB362 and VAR3 between AB0002A and AB362: Also, I assume row8 match would be taken as 1. dat1<- read.table(text=" ? S.No AB001A AB0002A AB362 ?? 1?? -/-??????? C/C?? A/A??????????????????????? ??? 2?? C/C??????? C/C?? A/A??????????????????????? ??? 3?? C/C??????? C/C?? A/A??????????????????????? ??? 4?? C/C??????? C/C?? A/A??????????????????????? ??? 5?? C/C??????? C/C?? A/A??????????????????????? ??? 6?? C/C??????? C/C?? A/A??????????????????????? ??? 7?? C/C??????? C/C?? A/A??????????????????????? ??? 8?? -/-??????? -/-?? -/-??????????????????????? ??? 9?? C/C??????? C/C?? A/A??????????????????????? ??? 10? C/C??????? C/C?? A/A??????????????????????? ??? 11? -/-??????? C/C?? A/A??????????????????????? ??? 12? C/C??????? C/C?? A/A??????????????????????? ??? 13? C/C??????? C/C?? A/A??????????????????????? ??? 14? C/C??????? C/C?? A/A??????????????????????? ??? 16? C/C??????? -/-?? A/A??????????????????????? ??? 17?? -/-??????? C/C?? A/A??????????????????????? ??? 18?? C/C??????? C/C?? A/A??????????????????????? ??? 19? C/C??????? C/C?? A/A ",sep="",header=TRUE,stringsAsFactors=FALSE) library(plyr) res<-mutate(dat1,VAR1=1*(AB001A==AB0002A),VAR2=1*(AB001A==AB362),VAR3=1*(AB0002A==AB362),SUM=rowSums(cbind(VAR1,VAR2,VAR3)),MATCH=(SUM/3)*100,Rank=rank(MATCH) ?head(res) #? S.No AB001A AB0002A AB362 VAR1 VAR2 VAR3 SUM??? MATCH Rank #1??? 1??? -/-???? C/C?? A/A??? 0??? 0??? 0?? 0? 0.00000? 2.5 #2??? 2??? C/C???? C/C?? A/A??? 1??? 0??? 0?? 1 33.33333 11.0 #3??? 3??? C/C???? C/C?? A/A??? 1??? 0??? 0?? 1 33.33333 11.0 #4??? 4??? C/C???? C/C?? A/A??? 1??? 0??? 0?? 1 33.33333 11.0 #5??? 5??? C/C???? C/C?? A/A??? 1??? 0??? 0?? 1 33.33333 11.0 #6??? 6??? C/C???? C/C?? A/A??? 1??? 0??? 0?? 1 33.33333 11.0 #or ?res<-mutate(dat1,VAR1=1*(AB001A==AB0002A),VAR2=1*(AB001A==AB362),VAR3=1*(AB0002A==AB362),SUM=rowSums(cbind(VAR1,VAR2,VAR3)),MATCH=(SUM/3)*100,Rank=rank(MATCH,ties.method="min")) ?head(res) #? S.No AB001A AB0002A AB362 VAR1 VAR2 VAR3 SUM??? MATCH Rank #1??? 1??? -/-???? C/C?? A/A??? 0??? 0??? 0?? 0? 0.00000??? 1 #2??? 2??? C/C???? C/C?? A/A??? 1??? 0??? 0?? 1 33.33333??? 5 #3??? 3??? C/C???? C/C?? A/A??? 1??? 0??? 0?? 1 33.33333??? 5 #4??? 4??? C/C???? C/C?? A/A??? 1??? 0??? 0?? 1 33.33333??? 5 #5??? 5??? C/C???? C/C?? A/A??? 1??? 0??? 0?? 1 33.33333??? 5 #6??? 6??? C/C???? C/C?? A/A??? 1??? 0??? 0?? 1 33.33333??? 5 A.K.>Hi to all bloggers,?>my data looks like this,> >S. No ? AB001A ?AB0002A AB362 ? VAR1 ? ?VAR2 ? ?VAR3 ? ?SUM %Match ?Rank?>? 1 ? -/- ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ?> ? 2 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ? >? 3 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ? >? 4 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ?? > 5 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ? >? 6 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ?? > 7 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ?? > 8 ? -/- ? ? ? ?-/- ? -/- ? ? ? ? ? ? ? ? ? ? ? ? ?? > 9 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ?? > 10 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ?? > 11 ?-/- ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ?? > 12 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ?? > 13 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ?? > 14 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ?? > 16 ?C/C ? ? ? ?-/- ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ?? > 17 ? -/- ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ?? > 18 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ? ?? > 19 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?>I want to match obs 3 with obs 2 if it exactly matched then scorewill be 1 else 0, that will be stored in var1 for AB001a, in var2 for ab0002a and in >var3 for ab362 and i want to calculate sum of all the 1's and observation match percent and their rank (top ten matchers), I did this successfully in >excel but it took me lot of time, i used if condition in excel like (=if(A3=A$2,1,0) and then i dragged among all obs and i did sum of all obs, their >%match and rank. My question is how can i do this in R? can i use match package for this? or other packages will help me? my data is so big with >5,15,567 obs. can any one guide me how to do this in sas because i want to reduce my time to analyze my data. Thanking you Regards,
Hi,
May be this helps:
As you wanted to match only from row3 onwards to row2, the corresponding values
on row1 and row2 were set to NA.
dat1<- read.table(text="
? S.No AB001A AB0002A AB362
?? P1?? -/-??????? C/C?? A/A??????????????????????
??? P2?? C/C??????? C/C?? A/A??????????????????????
??? 3?? C/C??????? C/C?? A/A??????????????????????
??? 4?? C/C??????? C/C?? A/A??????????????????????
??? 5?? C/C??????? C/C?? A/A??????????????????????
??? 6?? C/C??????? C/C?? A/A??????????????????????
??? 7?? C/C??????? C/C?? A/A??????????????????????
??? 8?? -/-??????? -/-?? -/-??????????????????????
??? 9?? C/C??????? C/C?? A/A??????????????????????
??? 10? C/C??????? C/C?? A/A??????????????????????
??? 11? -/-??????? C/C?? A/A??????????????????????
??? 12? C/C??????? C/C?? A/A??????????????????????
??? 13? C/C??????? C/C?? A/A??????????????????????
??? 14? C/C??????? C/C?? A/A??????????????????????
??? 15? C/C??????? -/-?? A/A??????????????????????
??? 16?? -/-??????? C/C?? A/A??????????????????????
??? 17?? A/A??????? A/C?? A/A??????????????????????
??? 18? C/A??????? A/A?? A/A
",sep="",header=TRUE,stringsAsFactors=FALSE)
dat2<-cbind(dat1,(1*mapply("==",dat1[,-1],dat1[2,-1])))
names(dat2)[duplicated(names(dat2))]<-
paste0(names(dat2)[duplicated(names(dat2))],"_1")
library(plyr)
?dat3<-mutate(dat2,SUM=rowSums(cbind(AB001A_1,AB0002A_1,AB362_1)),
MATCH=(SUM/3)*100)
?dat3[1:2,5:9]<-NA
res<-mutate(dat3,RANK=rank(MATCH,ties.method="min"))
?head(res)
#? S.No AB001A AB0002A AB362 AB001A_1 AB0002A_1 AB362_1 SUM MATCH RANK
#1?? P1??? -/-???? C/C?? A/A?????? NA??????? NA????? NA? NA??? NA?? 17
#2?? P2??? C/C???? C/C?? A/A?????? NA??????? NA????? NA? NA??? NA?? 18
#3??? 3??? C/C???? C/C?? A/A??????? 1???????? 1?????? 1?? 3?? 100??? 7
#4??? 4??? C/C???? C/C?? A/A??????? 1???????? 1?????? 1?? 3?? 100??? 7
#5??? 5??? C/C???? C/C?? A/A??????? 1???????? 1?????? 1?? 3?? 100??? 7
#6??? 6??? C/C???? C/C?? A/A??????? 1???????? 1?????? 1?? 3?? 100??? 7
A.K.
>Hi Arun,
>Thank you very much for your help in solving my problem,
>S. No ? AB001A ?AB0002A AB362 ? AB001A ? ?AB0002A ? ? AB362 ? SUM %Match
?Rank
?> ? P1 ? -/- ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
? > P 2 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
? >? 3 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
? >? 4 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
? >? 5 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 6 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 7 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 8 ? -/- ? ? ? ?-/- ? -/- ? ? ? ? ? ? ? ? ? ? ? ?
?? > 9 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? >10 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 11 ?-/- ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 12 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 13 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 14 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
? ? >16 ?C/C ? ? ? ?-/- ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
>Actually i want to match observation from 3 to 16 with the value in
p2 (i.e 3 with p2, 4 with p2, 5 with p2 etc), if they match i would like
to give >value 1 and store it in corresponding dummy variable i.e.
AB001A and i would like to do samething for remaining vars too and
storing in their >dummy vars. Finally i want make sum of all the matched
(i.e. 1 score) in each row and calculate percentage of match and then
rank. This what i >want, sorry for not expressing my problem exactly in
understandable way.
>Hi to all bloggers,
?>my data looks like this, >
>S. No ? AB001A ?AB0002A AB362 ? VAR1 ? ?VAR2 ? ?VAR3 ? ?SUM %Match ?Rank
?>? 1 ? -/- ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?> ? 2 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
? >? 3 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
? >? 4 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 5 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
? >? 6 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 7 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 8 ? -/- ? ? ? ?-/- ? -/- ? ? ? ? ? ? ? ? ? ? ? ?
?? > 9 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 10 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 11 ?-/- ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 12 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 13 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 14 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 16 ?C/C ? ? ? ?-/- ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 17 ? -/- ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 18 ? C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
?? > 19 ?C/C ? ? ? ?C/C ? A/A ? ? ? ? ? ? ? ? ? ? ? ?
>I want to match obs 3 with obs 2 if it exactly matched then score
will be 1 else 0, that will be stored in var1 for AB001a, in var2 for
ab0002a and in >var3 for ab362 and i want to calculate sum of all the 1's
and observation match percent and their rank (top ten matchers), I did
this successfully in >excel but it took me lot of time, i used if
condition in excel like (=if(A3=A$2,1,0) and then i dragged among all
obs and i did sum of all obs, their >%match and rank. My question is how
can i do this in R? can i use match package for this? or other packages
will help me? my data is so big with >5,15,567 obs. can any one guide me
how to do this in sas because i want to reduce my time to analyze my
data. Thanking you Regards,