lindsey at stat.fsu.edu
2009-Feb-02 21:10 UTC
[R] Using Information from the Stats4 package in base envir
Hi. Thank you very much in advance for your help. I have generated data from two simple linear models and used k-means clustering (stats4) to identify two clusters in the generated data. Next, I would like to do simple linear regression for each separate cluster. I can do this if I first use the cluster labels to define two separate data frames with the subset function. However, I would ideally like to use the subset option in lm to identify the data for regression rather than creating separate data frames. When I try to do it this way, I get the error, "Error in eval(expr, envir, enclos) : invalid 'envir' argument". The code for this is given below. If is is not possible to do this as an option within lm, is there another way rather than creating new data frames that are defined by a given cluster label? Thanks again for your help. library(stats4) #Define the Models #Two simple linear models: #Model A: respA = a0 + a1x + e(a), e(a) ~N(0, sigma^2(a)) #Model B: respB = b0 + b1x + e(b), e(b) ~N(0, sigma^2(b)) a0= 0; a1 = 1.5; sigmaA=4; b0=50; b1=-2; sigmaB=4; n=250; min=0; max=50; #Generate Data from the Models x1 = runif(n, min, max); #Explanatory variable eA = rnorm(n, 0, sigmaA); eB = rnorm(n, 0, sigmaB); respA = a0 +a1*x1 + eA; respB = b0 +b1*x1 + eB; #Responses #K-Means Clustering resp1 = c(respA, respB);#Put response into single vector for k-means x12 = rep(x1,nummod); #Put explanatory into a single vector data1 = data.frame(resp1, x12) #Data frame for kmeans cluster1 = kmeans(data1, 2, nstart=25)$cluster #Obtain cluster labels data1 = data.frame(data1 ,cluster1)#Cluster labels in third column data10 = subset(data1, cluster1==1) data11 = subset(data1, cluster1==2) model10 = lm(resp1 ~ x12, data10)#It works using the subset data frame model1 = lm(resp1 ~ x12, cluster1 == 1, data1); #Gives the following error Error in eval(expr, envir, enclos) : invalid 'envir' argument