Hi R Users,
I am going to run a multiple linear regression with around 57 independent
variables. Each time I run the model with just 11 variables, the results
are reasonable. With increasing the number of independent variables more
than 11, the coefficients will get ?NA? in the output. Is there any
limitation for the number of independent variables in multiple linear
regressions in R? I attached my dataset as well as R codes below:
mlr.data<- read.table("./multiple.txt",header=T)
mlr.output<- lm(formula = CaV ~ SHG + TrD+ CrH+ SPAD+ FlN+ FrN+ YT+
LA+ LDMP+ B+Cu+ Zn+ Mn + Fe+ K + P+ N +Clay30 +Silt30 +Sand30
+Clay60 +Silt60 +Sand60 +ESP30 +NaEx30+ CEC30+Cl30+ SAR30 +KSol30+ NaSol30
+CaMgSol3 +ZnAv30 +FeAv30 +OC30 +PAv30 +KAv30 +TNV30+ pH30+ EC30 +SP30
+ESP60 +NaEx60 +CEC60 +Cl60 +SAR60 +KSol60 +NaSol60 +CaMgSol6
+ZnAv60+FeAv60 +OC60 +PAv60 +KAv60 +TNV60 +pH60 + EC60 +SP60, data=mlr.data)
summary (mlr.output)
Regards,
Reza
-------------- next part --------------
CaV SHG TrD CrH SPAD FlN FrN YT LA LDMP B Cu Zn Mn Fe K P N Clay30 Silt30 Sand30
Clay60 Silt60 Sand60 ESP30 NaEx30 CEC30 Cl30 SAR30 KSol30 NaSol30 CaMgSol3
ZnAv30 FeAv30 OC30 PAv30 KAv30 TNV30 pH30 EC30 SP30 ESP60 NaEx60 CEC60 Cl60
SAR60 KSol60 NaSol60 CaMgSol6 ZnAv60 FeAv60 OC60 PAv60 KAv60 TNV60 pH60 EC60
SP60
49.83 15.8 22.32 45.8 82.7 126 5 55.8 34.7 57.7 17.9 8.4 14.7 50.5 144.7 0.9 0.1
1.7 14 50 36 10 26 64 3.2 0.3 11.9 87.7 5.6 2.5 25.7 69.8 2.9 3.1 0.66 64.4 360
14.8 9.94 0.6 39.6 4.8 0.2 12.2 51.4 6.9 1.4 33.4 69.9 1.8 3.4 0.2 3.5 290 11.5
7.19 3.21 40.1
37.85 18.9 20.89 53.2 75.9 169 4 67.4 40.1 58.6 15.8 7.9 12.1 53.2 141.6 0.7 0.1
1.9 13 51 34 11 28 64.0 1.9 0.2 12.5 45.7 7.8 3.8 22.9 77.9 3.1 2.5 0.69 45.6
390 14.8 7.2 1.1 40.1 5.4 0.1 13.3 59.9 7.9 1.7 56.6 58.9 1.5 5.5 0.6 1.9 350
10.2 7.1 3.7 42.1
64.12 20.7 19.04 43.9 74.3 133 5 55.9 38.7 60.0 12.1 6.3 12.6 47.4 159.5 1.0 0.1
1.6 12 48 33 13 26 62.0 6.4 1.1 12.6 33.2 10.1 1.8 43.8 99.3 1.5 2.8 0.57 20.5
470 15.1 7.1 0.7 39.8 4.9 0.3 10.9 65.5 7.1 2.3 42.1 66.4 1.1 3.9 0.7 2.9 288.9
12.2 7.2 4.1 35.6
90.28 14.6 19.52 56.9 61.9 145 7 66.5 33.2 59.1 13.7 4.7 10.0 52.1 241.1 0.8 0.1
1.9 13 47 32 10 30 57.0 5.5 0.3 11.7 31.2 3.5 2.4 50.4 65.7 1.9 2.5 0.88 41.9
398 14.3 7 2.4 38.7 3.1 0.5 14.1 71.9 8.4 1.9 36.7 59.1 0.9 2.6 0.5 2.7 290.7
13.6 7 2.6 43.7
111.18 13.2 16.53 61.3 78 127 6 49.9 41.7 51.6 14.7 4.7 10.0 55.8 148.9 0.7 0.1
1.9 14 46 37 11 28 60.0 3.9 0.7 11 21.7 4.3 0.9 33.7 73.6 2.3 2.7 0.78 33.4 349
15.2 7.4 0.9 39.1 2.8 0.1 12.5 76.6 8.9 3.1 32.1 71.4 0.5 6.9 0.5 2.5 256.9 15.1
7.5 1.7 36.7
59.11 12.9 21.34 45.3 78.9 178 6 63.3 39.8 52.0 19.5 4.2 8.9 54.7 229.5 0.7 0.1
1.7 13 46 36 12 30 62.0 2.7 0.7 12.9 19.5 2.6 2.8 61.2 86.9 2.2 3.7 0.86 27.8
400.5 17.1 7.1 2.1 39.9 3.9 0.3 11.9 57.4 6.7 1.6 30.1 89.9 1.8 5.8 0.3 3.7
224.8 12.9 7.3 5.5 34.9
80.89 17.9 15.86 40.3 66.8 154 7 45.6 36.8 47.8 21.6 3.7 12.6 54.7 162.1 0.7 0.1
1.9 11 50 35 13 31 61.0 2.9 0.4 10.9 37.9 7.1 1.9 19.5 55.8 2.8 2.9 0.66 45.1
459 15.6 7.2 0.8 36.1 5.1 0.4 13.1 85.5 5.7 2.1 29.1 92.1 1.9 7.8 0.7 2.8 278.9
11.8 7.2 6.1 32.7
122.74 16.6 17.29 43 77.8 140 6 32.9 37.7 55.6 20.0 4.2 12.1 47.4 152.6 0.9 0.1
1.7 14 49 36 11 25 58.0 3.2 1.5 11.5 24.5 3.9 3.7 20.8 56.9 1.9 2.6 0.72 40.7
398 16.3 7.3 0.7 37.7 2.2 0.1 11.8 55.5 6.1 1.8 39.9 69.9 2.1 2.4 0.3 1.8 312.8
10.6 7.4 7.3 31.2
20.08 7.6 18.28 33.8 60.5 81 1 11.3 37.0 59.8 21.3 3.2 4.9 67.5 153.4 0.5 0.1
0.7 16 46 38 15 45 40 7.7 0.8 11.2 139.8 10.7 1.3 80.7 106.6 2.4 2.4 0.6 10.9
345.8 14.8 7.5 12.2 36.3 6.3 0.7 10.5 108.2 9.8 0.5 63.7 84.4 1.3 2.5 0.4 3.3
213.0 15.7 7.6 10.5 38.2
22.09 11.8 14.44 40.3 61.4 67 2 13.5 34.6 58.7 21.6 2.6 7.6 57.4 186.5 0.5 0.1
0.6 13 48 35 12 30 59 6.8 0.8 12.1 45.8 5.6 1.0 26.1 37.1 1.3 2.5 0.6 4.3 407.2
17.5 7.7 4.7 34.5 6.9 0.8 12.0 65.9 7.4 0.7 38.2 50.9 0.9 2.9 0.4 1.9 265.3 18.3
7.6 6.8 35.2
25.04 8.4 15.29 46.3 65.5 60 1 11.0 33.2 56.2 21.4 2.2 10.1 54.6 167.8 0.4 0.1
0.8 13 48 35 12 30 58 11.3 1.2 10.4 79.7 10.4 0.8 60.6 61.9 0.9 2.2 0.5 3.6
341.3 16.3 7.6 8.3 33.3 9.0 1.0 10.8 82.0 11.8 0.5 65.7 60.5 0.5 2.1 0.3 0.7
205.0 17.9 7.6 8.8 35.1
27.77 23.9 17.80 58.6 80.1 77 2 16.7 37.1 54.9 17.6 3.8 2.7 43.3 108.3 0.4 0.1
1.1 13 48 35 12 31 57 5.3 0.4 8.3 117.0 14.5 2.0 96.7 71.8 0.9 2.4 0.5 3.9 395.0
15.0 7.7 10.0 32.8 11.8 0.9 8.1 73.4 10.7 1.6 57.9 58.1 0.4 3.1 0.3 1.1 193.8
15.8 7.7 8.0 30.1
34.03 8.7 21.02 50.3 68.7 58 1 15.5 36.8 55.9 18.4 2.5 4.6 41.3 141.2 0.4 0.1
0.9 13 48 36 12 31 57 4.4 0.5 10.2 34.1 4.7 1.3 22.4 30.8 0.7 2.5 0.8 4.8 446.4
14.8 7.7 4.3 33.4 4.0 0.4 10.5 58.8 7.6 0.8 38.9 46.6 0.4 2.3 0.5 1.7 316.7 15.7
7.7 6.4 34.2
David L Carlson
2012-Feb-13 16:36 UTC
[R] multi-regression with more than 50 independent variables
You need to spend some time reading about multiple regression. In statistics
there is always what is possible and what is advisable. I'm not going to
address whether a regression of 57 independent variables is advisable, only
possible. For your data, it is not possible. The attached data contain only
13 observations so the maximum number of independent variables you can use
is 13. Consider the following example:
example <- data.frame(y=rnorm(3), x1=rnorm(3), x2=rnorm(3), x3=rnorm(3))
lm(y~x1 + x2, example)
lm(y~x1 + x2 + x3, example)
We create four variables using random normal numbers for 3 cases (rows). The
first regression (2 independent variables "works" (i.e. there are no
NA's).
The second produces an NA for the third independent variable. In my example,
the three random variables are not correlated with one another. In your data
there must be correlations among the 57 variables so that you are only
getting slope values for 11.
----------------------------------------------
David L Carlson
Associate Professor of Anthropology
Texas A&M University
College Station, TX 77843-4352
-----Original Message-----
From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org]
On
Behalf Of R DF
Sent: Monday, February 13, 2012 9:19 AM
To: r-help at r-project.org
Subject: [R] multi-regression with more than 50 independent variables
Hi R Users,
I am going to run a multiple linear regression with around 57 independent
variables. Each time I run the model with just 11 variables, the results
are reasonable. With increasing the number of independent variables more
than 11, the coefficients will get "NA" in the output. Is there any
limitation for the number of independent variables in multiple linear
regressions in R? I attached my dataset as well as R codes below:
mlr.data<- read.table("./multiple.txt",header=T)
mlr.output<- lm(formula = CaV ~ SHG + TrD+ CrH+ SPAD+ FlN+ FrN+ YT+
LA+ LDMP+ B+Cu+ Zn+ Mn + Fe+ K + P+ N +Clay30 +Silt30 +Sand30
+Clay60 +Silt60 +Sand60 +ESP30 +NaEx30+ CEC30+Cl30+ SAR30 +KSol30+ NaSol30
+CaMgSol3 +ZnAv30 +FeAv30 +OC30 +PAv30 +KAv30 +TNV30+ pH30+ EC30 +SP30
+ESP60 +NaEx60 +CEC60 +Cl60 +SAR60 +KSol60 +NaSol60 +CaMgSol6
+ZnAv60+FeAv60 +OC60 +PAv60 +KAv60 +TNV60 +pH60 + EC60 +SP60, data=mlr.data)
summary (mlr.output)
Regards,
Reza