I have a medical insurance claims datafile divided into blocks by member, with
multiple lines per member. I am processing these into a one line per member
binary model matrix. Member block sizes vary from 1 to 50+. I am matching
attributes in claims data to columns in the model matrix and have been getting
by with a for loop, but for large file size it takes much too long. Is there
vectorized/apply based method to do this more efficiently?
I am only targeting codes A-E so anything else does not register
> member<- c(rep(1,3),rep(2,2),rep(3,4),rep(4,2))
> code <-
c('A','C','F','B','E','D','A','B','D','G','A')
> claims.df <- data.frame(member=member,code=code)
> claims.df
member code
1 1 A
2 1 C
3 1 F
4 2 B
5 2 E
6 3 D
7 3 A
8 3 B
9 3 D
10 4 G
11 4 A
> code.list <-
c('A','B','C','D','E')
> n.code <- length(code.list)
> mbr.list <- unique(member)
> n.mbr <- length(mbr.list)
> code.mat <- matrix(0,n.mbr,n.code)
> dimnames(code.mat) <- list(mbr.list,code.list)
> for(i in 1:n.mbr){
+ mbr.i <- claims.df[claims.df$member==mbr.list[i],] #EXTRACT BLOCK OF
MEMBER CLAIMS
+ code.mat[i,unique(match(mbr.i$code,code.list))] <- 1
+ }
> code.mat
A B C D E
1 1 0 1 0 0
2 0 1 0 0 1
3 1 1 0 1 0
4 1 0 0 0 0
Craig
IMPORTANT NOTICE: This communication, including any attachment, contains
information that may be confidential or privileged, and is intended solely for
the entity or individual to whom it is addressed. If you are not the intended
recipient, you should delete this message and are hereby notified that any
disclosure, copying, or distribution of this message is strictly prohibited.
Nothing in this email, including any attachment, is intended to be a legally
binding signature.
[[alternative HTML version deleted]]