DG Christensen
2012-Aug-08 15:58 UTC
[R] Advice: How to best ensure column values match in different vectors?
Hello all, I would like some advice on how to order elements in a vector. Background: my company is running a k-means clustering model on our historical data warehouse of products, which will produce a matrix of cluster centers. Then, on our production web servers, we will take newly created products and find the cluster that is closest to the new product (we're calling this "scoring" the product). Simple stuff. The complex part is that the data source for the model is different from the source of the new product. My concern is how to best ensure that the order of the product attributes in the clustering model matches the attributes of the new product vector. Here's what I'm considering doing: Say my company keeps the attributes height, width, and length on our products (in reality we'll have over 200 attributes). I will create a constant of the column (i.e. attribute) names: PRODUCT.ATTRIBUTE.COLS <- c("H","W","L") PRODUCT.ATTRIBUTE.COUNT <- length( PRODUCT.ATTRIBUTE.COLS ) All new vectors (both during modeling and scoring) will be created with NaN values: product.vector <- rep(NaN, PRODUCT.ATTRIBUTE.COUNT) names( product.vector ) <- PRODUCT.ATTRIBUTE.COLS The vector will then be populated with attribute values like this. The values will be retrieved from whatever DB we're using: product.vector["H"] <- height.from.db product.vector["W"] <- width.from.db product.vector["L"] <- length.from.db Is this a reasonable way to do this? If so, one thing I'd like to add is error checking that validates that the attribute name exists, so if the code attempted to do: product.vector["WEIGHT"] <- weight.from.db it would throw some sort of error. What's the best way for handling that? Can I set the length of the vector to a fixed size? Thanks for any guidance, DG
R. Michael Weylandt
2012-Aug-09 04:32 UTC
[R] Advice: How to best ensure column values match in different vectors?
On Wed, Aug 8, 2012 at 10:58 AM, DG Christensen <dgc at enservio.com> wrote:> Hello all, I would like some advice on how to order elements in a vector. > > Background: my company is running a k-means clustering model on our > historical data warehouse of products, which will produce a matrix of > cluster centers. Then, on our production web servers, we will take > newly created products and find the cluster that is closest to the new > product (we're calling this "scoring" the product). Simple stuff. The > complex part is that the data source for the model is different from the > source of the new product. > > My concern is how to best ensure that the order of the product > attributes in the clustering model matches the attributes of the new > product vector. Here's what I'm considering doing: > > Say my company keeps the attributes height, width, and length on our > products (in reality we'll have over 200 attributes). I will create a > constant of the column (i.e. attribute) names: > > PRODUCT.ATTRIBUTE.COLS <- c("H","W","L") > PRODUCT.ATTRIBUTE.COUNT <- length( PRODUCT.ATTRIBUTE.COLS ) > > All new vectors (both during modeling and scoring) will be created with > NaN values: > > product.vector <- rep(NaN, PRODUCT.ATTRIBUTE.COUNT) > names( product.vector ) <- PRODUCT.ATTRIBUTE.COLS > > The vector will then be populated with attribute values like this. The > values will be retrieved from whatever DB we're using: > > product.vector["H"] <- height.from.db > product.vector["W"] <- width.from.db > product.vector["L"] <- length.from.db > > Is this a reasonable way to do this? If so, one thing I'd like to add > is error checking that validates that the attribute name exists, so if > the code attempted to do: > > product.vector["WEIGHT"] <- weight.from.db > > it would throw some sort of error. What's the best way for handling > that? Can I set the length of the vector to a fixed size?Hi DG, You can define your own class which errors out when accessing names which don't exist: E.g., as.strictvec <- function(x){ stopifnot(is.atomic(x)) class(x) <- c("strictvec", class(x)) x } `[<-.strictvec` <- function(x, i, j, value){ stopifnot(j %in% colnames(x)) NextMethod() } z <- matrix(1:3, ncol = 3); colnames(z) <- letters[1:3] z.strict <- as.strictvec(z) z[, "d"] <- 5 z.strict[, "d"] <- 5 # Error! Adapt as needed. Cheers, Michael> > Thanks for any guidance, > DG > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.