lindsey at stat.fsu.edu
2009-Feb-02 21:10 UTC
[R] Using Information from the Stats4 package in base envir
Hi. Thank you very much in advance for your help.
I have generated data from two simple linear models and used k-means
clustering (stats4) to identify two clusters in the generated data.
Next, I would like to do simple linear regression for each separate
cluster. I can do this if I first use the cluster labels to define
two separate data frames with the subset function.
However, I would ideally like to use the subset option in lm to
identify the data for regression rather than creating separate data
frames. When I try to do it this way, I get the error, "Error in
eval(expr, envir, enclos) : invalid 'envir' argument". The code
for
this is given below.
If is is not possible to do this as an option within lm, is there
another way rather than creating new data frames that are defined by a
given cluster label?
Thanks again for your help.
library(stats4)
#Define the Models
#Two simple linear models:
#Model A: respA = a0 + a1x + e(a), e(a) ~N(0, sigma^2(a))
#Model B: respB = b0 + b1x + e(b), e(b) ~N(0, sigma^2(b))
a0= 0; a1 = 1.5; sigmaA=4; b0=50; b1=-2; sigmaB=4; n=250; min=0; max=50;
#Generate Data from the Models
x1 = runif(n, min, max); #Explanatory variable
eA = rnorm(n, 0, sigmaA); eB = rnorm(n, 0, sigmaB);
respA = a0 +a1*x1 + eA; respB = b0 +b1*x1 + eB; #Responses
#K-Means Clustering
resp1 = c(respA, respB);#Put response into single vector for k-means
x12 = rep(x1,nummod); #Put explanatory into a single vector
data1 = data.frame(resp1, x12) #Data frame for kmeans
cluster1 = kmeans(data1, 2, nstart=25)$cluster #Obtain cluster labels
data1 = data.frame(data1 ,cluster1)#Cluster labels in third column
data10 = subset(data1, cluster1==1)
data11 = subset(data1, cluster1==2)
model10 = lm(resp1 ~ x12, data10)#It works using the subset data frame
model1 = lm(resp1 ~ x12, cluster1 == 1, data1); #Gives the following error
Error in eval(expr, envir, enclos) : invalid 'envir' argument
