Dear useRs, i have vectors of about 27 descriptors, each having 703 elements. what i want to do is the following 1. i want to do regression analysis of these 27 vectors individually, against a dependent vector, say B, having same number of elements.2. i would like to know best 10 regression results, if i do regression analysis of dependent vector against the random combination of any 4 descriptors. more precisely, in the first step we did regression of dependent vector against individual vector of each descriptor, but now we want R to randomly combine descriptors in a set of 4 and does regression analysis with B to see what are top 10 combination of descriptors giving good regression results with B? i hope i am clear. i know 2nd part is more tricky, but i will be extremely happy if you can answer any one of the above questions. thanks in advanceeliza [[alternative HTML version deleted]]
HI, May be this helps. set.seed(8) mat1<-matrix(sample(150,90,replace=FALSE),ncol=9,nrow=10) dat1<-data.frame(mat1) set.seed(10) B<-sample(150:190,10,replace=FALSE) res1<-lapply(dat1,function(x) lm(B~as.matrix(x))) #or res1<-lapply(dat1,function(x) lm(B~x)) res1Summary<-lapply(res1,summary) #to get the coefficients res1SummaryCoef<-lapply(res1,function(x) summary(x)$coefficients) res1SummaryCoef[1:3] #$X1 #??????????????? Estimate Std. Error?? t value???? Pr(>|t|) #(Intercept)? 150.1303702 8.45536736 17.755630 1.035959e-07 #as.matrix(x)?? 0.2126583 0.09304937? 2.285436 5.163141e-02 # #$X2 #????????????????? Estimate Std. Error???? t value???? Pr(>|t|) #(Intercept)? 168.219302287? 6.9904434 24.06418202 9.479720e-09 #as.matrix(x)? -0.002386046? 0.1146838 -0.02080544 9.839104e-01 # #$X3 #?????????????? Estimate Std. Error?? t value???? Pr(>|t|) #(Intercept)? 180.303999? 8.6675156 20.802270 2.990115e-08 #as.matrix(x)? -0.157268? 0.1021179 -1.540064 1.621101e-01 #to get pvalue of Fstatistic res1pvalueF<-lapply(res1,function(x) pf(summary(x)$fstatistic[1],summary(x)$fstatistic[2],summary(x)$fstatistic[3],lower.tail=FALSE)) #to get r.squared value res1rSquare<-lapply(res1,function(x) summary(x)$r.squared) ? #2nd part #Create some new datasets using random combination of columns from dat1 dat2<-dat1[,sample(names(dat1),4)] ?dat3<-dat1[,sample(names(dat1),4)] ?dat4<-dat1[,sample(names(dat1),4)] ?dat5<-dat1[,sample(names(dat1),4)] ?dat6<-dat1[,sample(names(dat1),4)] head(dat2) #? X7? X3? X8? X5 #1 85? 30 113 100 #2 89? 53 115? 32 #3 74? 79? 63? 54 #4 57? 28? 52? 94 #5? 6? 84 135 132 #6? 5 123 146 127 ?head(dat3) #?? X8? X2? X6? X3 #1 113? 64? 14? 30 #2 115? 13?? 7? 53 #3? 63? 60? 15? 79 #4? 52? 75? 34? 28 #5 135? 19 107? 84 #6 146 126? 27 123 #create a list of dataframes list1<-list(dat2,dat3,dat4,dat5,dat6) res2<-lapply(list1,function(x) lm(B~as.matrix(x))) res2rSquare<-lapply(res2,function(x) summary(x)$r.squared) unlist(res2rSquare) #[1] 0.8444332 0.6316695 0.6971695 0.7322519 0.4328805 For selection of the best model based on combination of descriptors, you can also look for step-wise elimination, or based on AIC or BIC values. A.K. ----- Original Message ----- From: eliza botto <eliza_botto at hotmail.com> To: "r-help at r-project.org" <r-help at r-project.org> Cc: Sent: Friday, October 26, 2012 4:00 PM Subject: [R] regression analysis in R Dear useRs, i have vectors of about 27 descriptors, each having 703 elements. what i want to do is the following 1. i want to do regression analysis of these 27 vectors individually, against a dependent vector, say B, having same number of elements.2. i would like to know best 10 regression results, if i do regression analysis of dependent vector against the random combination of any 4 descriptors. more precisely, in the first step we did regression of dependent vector against individual vector of each descriptor, but now we want R to randomly combine descriptors in a set of 4 and does regression analysis with B to see what are top 10 combination of descriptors giving good regression results with B? i hope i am clear. i know 2nd part is more tricky, but i will be extremely happy if you can answer any one of the above questions. thanks in advanceeliza ??? ??? ??? ? ??? ??? ? ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Hello, Using the same example, at the end, add the following lines to have the models ordered by AIC. aic <- lapply(res2, AIC) idx <- order(unlist(aic)) lapply(list1[idx], names) And if there are more than 10 models, if you want the 10 best, best10 <- idx[1:10] lapply(list1[best10], names) Hope this helps, Rui Barradas Em 26-10-2012 22:47, arun escreveu:> HI, > May be this helps. > set.seed(8) > mat1<-matrix(sample(150,90,replace=FALSE),ncol=9,nrow=10) > dat1<-data.frame(mat1) > set.seed(10) > B<-sample(150:190,10,replace=FALSE) > > res1<-lapply(dat1,function(x) lm(B~as.matrix(x))) > #or > res1<-lapply(dat1,function(x) lm(B~x)) > > res1Summary<-lapply(res1,summary) > #to get the coefficients > res1SummaryCoef<-lapply(res1,function(x) summary(x)$coefficients) > res1SummaryCoef[1:3] > #$X1 > # Estimate Std. Error t value Pr(>|t|) > #(Intercept) 150.1303702 8.45536736 17.755630 1.035959e-07 > #as.matrix(x) 0.2126583 0.09304937 2.285436 5.163141e-02 > # > #$X2 > # Estimate Std. Error t value Pr(>|t|) > #(Intercept) 168.219302287 6.9904434 24.06418202 9.479720e-09 > #as.matrix(x) -0.002386046 0.1146838 -0.02080544 9.839104e-01 > # > #$X3 > # Estimate Std. Error t value Pr(>|t|) > #(Intercept) 180.303999 8.6675156 20.802270 2.990115e-08 > #as.matrix(x) -0.157268 0.1021179 -1.540064 1.621101e-01 > > > #to get pvalue of Fstatistic > res1pvalueF<-lapply(res1,function(x) pf(summary(x)$fstatistic[1],summary(x)$fstatistic[2],summary(x)$fstatistic[3],lower.tail=FALSE)) > #to get r.squared value > res1rSquare<-lapply(res1,function(x) summary(x)$r.squared) > > #2nd part > #Create some new datasets using random combination of columns from dat1 > dat2<-dat1[,sample(names(dat1),4)] > dat3<-dat1[,sample(names(dat1),4)] > dat4<-dat1[,sample(names(dat1),4)] > dat5<-dat1[,sample(names(dat1),4)] > dat6<-dat1[,sample(names(dat1),4)] > head(dat2) > # X7 X3 X8 X5 > #1 85 30 113 100 > #2 89 53 115 32 > #3 74 79 63 54 > #4 57 28 52 94 > #5 6 84 135 132 > #6 5 123 146 127 > head(dat3) > # X8 X2 X6 X3 > #1 113 64 14 30 > #2 115 13 7 53 > #3 63 60 15 79 > #4 52 75 34 28 > #5 135 19 107 84 > #6 146 126 27 123 > > #create a list of dataframes > list1<-list(dat2,dat3,dat4,dat5,dat6) > res2<-lapply(list1,function(x) lm(B~as.matrix(x))) > res2rSquare<-lapply(res2,function(x) summary(x)$r.squared) > unlist(res2rSquare) > #[1] 0.8444332 0.6316695 0.6971695 0.7322519 0.4328805 > > For selection of the best model based on combination of descriptors, you can also look for step-wise elimination, or based on AIC or BIC values. > > A.K. > > > > > > > > ----- Original Message ----- > From: eliza botto <eliza_botto at hotmail.com> > To: "r-help at r-project.org" <r-help at r-project.org> > Cc: > Sent: Friday, October 26, 2012 4:00 PM > Subject: [R] regression analysis in R > > > Dear useRs, > i have vectors of about 27 descriptors, each having 703 elements. what i want to do is the following 1. i want to do regression analysis of these 27 vectors individually, against a dependent vector, say B, having same number of elements.2. i would like to know best 10 regression results, if i do regression analysis of dependent vector against the random combination of any 4 descriptors. more precisely, in the first step we did regression of dependent vector against individual vector of each descriptor, but now we want R to randomly combine descriptors in a set of 4 and does regression analysis with B to see what are top 10 combination of descriptors giving good regression results with B? i hope i am clear. i know 2nd part is more tricky, but i will be extremely happy if you can answer any one of the above questions. > thanks in advanceeliza > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
On 2012-10-26 13:00, eliza botto wrote:> > Dear useRs, > i have vectors of about 27 descriptors, each having 703 elements. what i want to do is the following 1. i want to do regression analysis of these 27 vectors individually, against a dependent vector, say B, having same number of elements.2. i would like to know best 10 regression results, if i do regression analysis of dependent vector against the random combination of any 4 descriptors. more precisely, in the first step we did regression of dependent vector against individual vector of each descriptor, but now we want R to randomly combine descriptors in a set of 4 and does regression analysis with B to see what are top 10 combination of descriptors giving good regression results with B? i hope i am clear. i know 2nd part is more tricky, but i will be extremely happy if you can answer any one of the above questions. > thanks in advanceeliza >I hope that you're doing _exploratory_ data analysis. Have a look at the 'leaps' package. It might be suitable. Peter Ehlers