Hi R Users, I am going to run a multiple linear regression with around 57 independent variables. Each time I run the model with just 11 variables, the results are reasonable. With increasing the number of independent variables more than 11, the coefficients will get ?NA? in the output. Is there any limitation for the number of independent variables in multiple linear regressions in R? I attached my dataset as well as R codes below: mlr.data<- read.table("./multiple.txt",header=T) mlr.output<- lm(formula = CaV ~ SHG + TrD+ CrH+ SPAD+ FlN+ FrN+ YT+ LA+ LDMP+ B+Cu+ Zn+ Mn + Fe+ K + P+ N +Clay30 +Silt30 +Sand30 +Clay60 +Silt60 +Sand60 +ESP30 +NaEx30+ CEC30+Cl30+ SAR30 +KSol30+ NaSol30 +CaMgSol3 +ZnAv30 +FeAv30 +OC30 +PAv30 +KAv30 +TNV30+ pH30+ EC30 +SP30 +ESP60 +NaEx60 +CEC60 +Cl60 +SAR60 +KSol60 +NaSol60 +CaMgSol6 +ZnAv60+FeAv60 +OC60 +PAv60 +KAv60 +TNV60 +pH60 + EC60 +SP60, data=mlr.data) summary (mlr.output) Regards, Reza -------------- next part -------------- CaV SHG TrD CrH SPAD FlN FrN YT LA LDMP B Cu Zn Mn Fe K P N Clay30 Silt30 Sand30 Clay60 Silt60 Sand60 ESP30 NaEx30 CEC30 Cl30 SAR30 KSol30 NaSol30 CaMgSol3 ZnAv30 FeAv30 OC30 PAv30 KAv30 TNV30 pH30 EC30 SP30 ESP60 NaEx60 CEC60 Cl60 SAR60 KSol60 NaSol60 CaMgSol6 ZnAv60 FeAv60 OC60 PAv60 KAv60 TNV60 pH60 EC60 SP60 49.83 15.8 22.32 45.8 82.7 126 5 55.8 34.7 57.7 17.9 8.4 14.7 50.5 144.7 0.9 0.1 1.7 14 50 36 10 26 64 3.2 0.3 11.9 87.7 5.6 2.5 25.7 69.8 2.9 3.1 0.66 64.4 360 14.8 9.94 0.6 39.6 4.8 0.2 12.2 51.4 6.9 1.4 33.4 69.9 1.8 3.4 0.2 3.5 290 11.5 7.19 3.21 40.1 37.85 18.9 20.89 53.2 75.9 169 4 67.4 40.1 58.6 15.8 7.9 12.1 53.2 141.6 0.7 0.1 1.9 13 51 34 11 28 64.0 1.9 0.2 12.5 45.7 7.8 3.8 22.9 77.9 3.1 2.5 0.69 45.6 390 14.8 7.2 1.1 40.1 5.4 0.1 13.3 59.9 7.9 1.7 56.6 58.9 1.5 5.5 0.6 1.9 350 10.2 7.1 3.7 42.1 64.12 20.7 19.04 43.9 74.3 133 5 55.9 38.7 60.0 12.1 6.3 12.6 47.4 159.5 1.0 0.1 1.6 12 48 33 13 26 62.0 6.4 1.1 12.6 33.2 10.1 1.8 43.8 99.3 1.5 2.8 0.57 20.5 470 15.1 7.1 0.7 39.8 4.9 0.3 10.9 65.5 7.1 2.3 42.1 66.4 1.1 3.9 0.7 2.9 288.9 12.2 7.2 4.1 35.6 90.28 14.6 19.52 56.9 61.9 145 7 66.5 33.2 59.1 13.7 4.7 10.0 52.1 241.1 0.8 0.1 1.9 13 47 32 10 30 57.0 5.5 0.3 11.7 31.2 3.5 2.4 50.4 65.7 1.9 2.5 0.88 41.9 398 14.3 7 2.4 38.7 3.1 0.5 14.1 71.9 8.4 1.9 36.7 59.1 0.9 2.6 0.5 2.7 290.7 13.6 7 2.6 43.7 111.18 13.2 16.53 61.3 78 127 6 49.9 41.7 51.6 14.7 4.7 10.0 55.8 148.9 0.7 0.1 1.9 14 46 37 11 28 60.0 3.9 0.7 11 21.7 4.3 0.9 33.7 73.6 2.3 2.7 0.78 33.4 349 15.2 7.4 0.9 39.1 2.8 0.1 12.5 76.6 8.9 3.1 32.1 71.4 0.5 6.9 0.5 2.5 256.9 15.1 7.5 1.7 36.7 59.11 12.9 21.34 45.3 78.9 178 6 63.3 39.8 52.0 19.5 4.2 8.9 54.7 229.5 0.7 0.1 1.7 13 46 36 12 30 62.0 2.7 0.7 12.9 19.5 2.6 2.8 61.2 86.9 2.2 3.7 0.86 27.8 400.5 17.1 7.1 2.1 39.9 3.9 0.3 11.9 57.4 6.7 1.6 30.1 89.9 1.8 5.8 0.3 3.7 224.8 12.9 7.3 5.5 34.9 80.89 17.9 15.86 40.3 66.8 154 7 45.6 36.8 47.8 21.6 3.7 12.6 54.7 162.1 0.7 0.1 1.9 11 50 35 13 31 61.0 2.9 0.4 10.9 37.9 7.1 1.9 19.5 55.8 2.8 2.9 0.66 45.1 459 15.6 7.2 0.8 36.1 5.1 0.4 13.1 85.5 5.7 2.1 29.1 92.1 1.9 7.8 0.7 2.8 278.9 11.8 7.2 6.1 32.7 122.74 16.6 17.29 43 77.8 140 6 32.9 37.7 55.6 20.0 4.2 12.1 47.4 152.6 0.9 0.1 1.7 14 49 36 11 25 58.0 3.2 1.5 11.5 24.5 3.9 3.7 20.8 56.9 1.9 2.6 0.72 40.7 398 16.3 7.3 0.7 37.7 2.2 0.1 11.8 55.5 6.1 1.8 39.9 69.9 2.1 2.4 0.3 1.8 312.8 10.6 7.4 7.3 31.2 20.08 7.6 18.28 33.8 60.5 81 1 11.3 37.0 59.8 21.3 3.2 4.9 67.5 153.4 0.5 0.1 0.7 16 46 38 15 45 40 7.7 0.8 11.2 139.8 10.7 1.3 80.7 106.6 2.4 2.4 0.6 10.9 345.8 14.8 7.5 12.2 36.3 6.3 0.7 10.5 108.2 9.8 0.5 63.7 84.4 1.3 2.5 0.4 3.3 213.0 15.7 7.6 10.5 38.2 22.09 11.8 14.44 40.3 61.4 67 2 13.5 34.6 58.7 21.6 2.6 7.6 57.4 186.5 0.5 0.1 0.6 13 48 35 12 30 59 6.8 0.8 12.1 45.8 5.6 1.0 26.1 37.1 1.3 2.5 0.6 4.3 407.2 17.5 7.7 4.7 34.5 6.9 0.8 12.0 65.9 7.4 0.7 38.2 50.9 0.9 2.9 0.4 1.9 265.3 18.3 7.6 6.8 35.2 25.04 8.4 15.29 46.3 65.5 60 1 11.0 33.2 56.2 21.4 2.2 10.1 54.6 167.8 0.4 0.1 0.8 13 48 35 12 30 58 11.3 1.2 10.4 79.7 10.4 0.8 60.6 61.9 0.9 2.2 0.5 3.6 341.3 16.3 7.6 8.3 33.3 9.0 1.0 10.8 82.0 11.8 0.5 65.7 60.5 0.5 2.1 0.3 0.7 205.0 17.9 7.6 8.8 35.1 27.77 23.9 17.80 58.6 80.1 77 2 16.7 37.1 54.9 17.6 3.8 2.7 43.3 108.3 0.4 0.1 1.1 13 48 35 12 31 57 5.3 0.4 8.3 117.0 14.5 2.0 96.7 71.8 0.9 2.4 0.5 3.9 395.0 15.0 7.7 10.0 32.8 11.8 0.9 8.1 73.4 10.7 1.6 57.9 58.1 0.4 3.1 0.3 1.1 193.8 15.8 7.7 8.0 30.1 34.03 8.7 21.02 50.3 68.7 58 1 15.5 36.8 55.9 18.4 2.5 4.6 41.3 141.2 0.4 0.1 0.9 13 48 36 12 31 57 4.4 0.5 10.2 34.1 4.7 1.3 22.4 30.8 0.7 2.5 0.8 4.8 446.4 14.8 7.7 4.3 33.4 4.0 0.4 10.5 58.8 7.6 0.8 38.9 46.6 0.4 2.3 0.5 1.7 316.7 15.7 7.7 6.4 34.2
David L Carlson
2012-Feb-13 16:36 UTC
[R] multi-regression with more than 50 independent variables
You need to spend some time reading about multiple regression. In statistics there is always what is possible and what is advisable. I'm not going to address whether a regression of 57 independent variables is advisable, only possible. For your data, it is not possible. The attached data contain only 13 observations so the maximum number of independent variables you can use is 13. Consider the following example: example <- data.frame(y=rnorm(3), x1=rnorm(3), x2=rnorm(3), x3=rnorm(3)) lm(y~x1 + x2, example) lm(y~x1 + x2 + x3, example) We create four variables using random normal numbers for 3 cases (rows). The first regression (2 independent variables "works" (i.e. there are no NA's). The second produces an NA for the third independent variable. In my example, the three random variables are not correlated with one another. In your data there must be correlations among the 57 variables so that you are only getting slope values for 11. ---------------------------------------------- David L Carlson Associate Professor of Anthropology Texas A&M University College Station, TX 77843-4352 -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of R DF Sent: Monday, February 13, 2012 9:19 AM To: r-help at r-project.org Subject: [R] multi-regression with more than 50 independent variables Hi R Users, I am going to run a multiple linear regression with around 57 independent variables. Each time I run the model with just 11 variables, the results are reasonable. With increasing the number of independent variables more than 11, the coefficients will get "NA" in the output. Is there any limitation for the number of independent variables in multiple linear regressions in R? I attached my dataset as well as R codes below: mlr.data<- read.table("./multiple.txt",header=T) mlr.output<- lm(formula = CaV ~ SHG + TrD+ CrH+ SPAD+ FlN+ FrN+ YT+ LA+ LDMP+ B+Cu+ Zn+ Mn + Fe+ K + P+ N +Clay30 +Silt30 +Sand30 +Clay60 +Silt60 +Sand60 +ESP30 +NaEx30+ CEC30+Cl30+ SAR30 +KSol30+ NaSol30 +CaMgSol3 +ZnAv30 +FeAv30 +OC30 +PAv30 +KAv30 +TNV30+ pH30+ EC30 +SP30 +ESP60 +NaEx60 +CEC60 +Cl60 +SAR60 +KSol60 +NaSol60 +CaMgSol6 +ZnAv60+FeAv60 +OC60 +PAv60 +KAv60 +TNV60 +pH60 + EC60 +SP60, data=mlr.data) summary (mlr.output) Regards, Reza