Mckinstry, Craig
2014-Mar-06 19:23 UTC
[R] question on more efficient data block-match processing
I have a medical insurance claims datafile divided into blocks by member, with
multiple lines per member. I am process these into a one line per member model
matrix. Member block sizes vary from 1 to 50+. I am match attributes in claims
data to columns in the model matrix and
have been getting by with a for loop, but for large file size it takes much too
long. Is there vectorized/apply based method to do this more efficiently?
input data:
member code
1 A
1 C
1 F
2 B
2 E
3 D
3 A
3 B
3 D
4 G
4 A
code.list <- c(A,B,C,D,E)
for(i in 1:n.mbr){
mbr.i <- dat[dat$Rmbr==mbr.list[i],] #EXTRACT BLOCK OF MEMBER CLAIMS
matrix.mat[i,unique(match(mbr.i$code,code.list))] <- 1
}
output model.matrix
Member A B C D E
1 1 0 1 0 0
2 0 1 0 0 1
3 1 1 0 1 0
4 1 0 0 0 0
Craig McKinstry
100 Market, 6th floor
Office: 503-225-6878 | Cell: 509-778-2438
IMPORTANT NOTICE: This communication, including any attachment, contains
information that may be confidential or privileged, and is intended solely for
the entity or individual to whom it is addressed. If you are not the intended
recipient, you should delete this message and are hereby notified that any
disclosure, copying, or distribution of this message is strictly prohibited.
Nothing in this email, including any attachment, is intended to be a legally
binding signature.
Rainer Schuermann
2014-Mar-07 08:18 UTC
[R] question on more efficient data block-match processing
What I would do:
# read in your sample data
mbr <- read.table( "clipboard", header = TRUE, stringsAsFactors =
FALSE )
# create a vector with the codes you want to consider
code.list <-
c("A","B","C","D","E")
# reduce the data accordingly
mbr <- mbr[ mbr$code %in% code.list, ]
# get your model matrix using reshape
library( reshape )
model.matrix <- as.data.frame( cast( melt( mbr ), value ~ code ) )
# Cosmetics
colnames( model.matrix )[1] <- "Member"
model.matrix[ 2 : ( length( model.matrix[1,] ) ) ] <-
ifelse( model.matrix[ 2 : ( length( model.matrix[1,] ) ) ] > 0, 1, 0 )
On Thursday 06 March 2014 19:23:03 Mckinstry, Craig
wrote:>
> I have a medical insurance claims datafile divided into blocks by member,
with multiple lines per member. I am process these into a one line per member
model matrix. Member block sizes vary from 1 to 50+. I am match attributes in
claims data to columns in the model matrix and
>
> have been getting by with a for loop, but for large file size it takes much
too long. Is there vectorized/apply based method to do this more efficiently?
>
> input data:
>
> member code
> 1 A
> 1 C
> 1 F
> 2 B
> 2 E
> 3 D
> 3 A
> 3 B
> 3 D
> 4 G
> 4 A
>
> code.list <- c(A,B,C,D,E)
> for(i in 1:n.mbr){
> mbr.i <- dat[dat$Rmbr==mbr.list[i],] #EXTRACT BLOCK OF MEMBER CLAIMS
> matrix.mat[i,unique(match(mbr.i$code,code.list))] <- 1
> }
>
>
> output model.matrix
> Member A B C D E
> 1 1 0 1 0 0
> 2 0 1 0 0 1
> 3 1 1 0 1 0
> 4 1 0 0 0 0
>
> Craig McKinstry
> 100 Market, 6th floor
> Office: 503-225-6878 | Cell: 509-778-2438
>
>
> IMPORTANT NOTICE: This communication, including any attachment, contains
information that may be confidential or privileged, and is intended solely for
the entity or individual to whom it is addressed. If you are not the intended
recipient, you should delete this message and are hereby notified that any
disclosure, copying, or distribution of this message is strictly prohibited.
Nothing in this email, including any attachment, is intended to be a legally
binding signature.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.