Hi, I have a fairly simple linear regression using the lm function. There are about 100 variables and 30,000 rows of data. It runs fine and produces a decent looking R2 value. I'm interested in performing a stepwise variable selection to see if things can be cleaned up a bit. Calling the step function returns ONE iteration (all the variables) and then stops. No errors are reported. Can someone suggest why this might not be working as expected. (Normally this function steps through all the variables to find the "best" combination.) Thanks! -N
Hi Noah, Are you able to reproduce the example on a smaller dataset? Do you have any strange variable names or I created a 30000 x 100 matrix, fit a linear model and step has been running fine (other than bringing my poor netbook to it's knees). It also might be helpful if you could post your session info per the posting guide. You could also try: debug(step). Then run step on your model so you can see what the function does before it exits. Cheers, Josh On Jan 9, 2011, at 23:57, Noah Silverman <noah at smartmediacorp.com> wrote:> Hi, > > I have a fairly simple linear regression using the lm function. There > are about 100 variables and 30,000 rows of data. It runs fine and > produces a decent looking R2 value. I'm interested in performing a > stepwise variable selection to see if things can be cleaned up a bit. > > Calling the step function returns ONE iteration (all the variables) and > then stops. No errors are reported. > > Can someone suggest why this might not be working as expected. > (Normally this function steps through all the variables to find the > "best" combination.) > > Thanks! > > -N > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi, Its a lot of data, but here are sum summary stats: l <- lm(trainy ~ x)> str(x)num [1:31205, 1:48] 0.0975 -0.1987 0.3254 -0.7912 0.0975 ... - attr(*, "dimnames")=List of 2 ..$ : chr [1:31205] "5" "6" "7" "8" ... ..$ : NULL - attr(*, "names")= chr [1:1497840] "a" NA NA NA ... summary(x) V1 V2 V3 V4 V5 V6 Min. :-1.679848 Min. :-1.606698 Min. :-1.617491 Min. :-1.6534404 Min. :-0.93052 Min. :-1.66594 1st Qu.:-0.865216 1st Qu.:-0.867430 1st Qu.:-0.875567 1st Qu.:-0.9042894 1st Qu.:-0.67904 1st Qu.:-0.90768 Median : 0.074739 Median :-0.004886 Median :-0.009924 Median : 0.0946436 Median :-0.40504 Median :-0.14942 Mean : 0.000492 Mean :-0.001140 Mean :-0.001563 Mean :-0.0006543 Mean :-0.01372 Mean : 0.01700 3rd Qu.: 0.826709 3rd Qu.: 0.857625 3rd Qu.: 0.855687 3rd Qu.: 0.8438270 3rd Qu.: 0.23305 3rd Qu.: 0.79841 Max. : 1.578680 Max. : 1.596925 Max. : 1.597644 Max. : 1.5930105 Max. : 2.74787 Max. : 2.88363 V7 V8 V9 V10 V11 V12 Min. :-2.84607 Min. :-17.340329 Min. :-5.72374 Min. :-9.088574 Min. :-0.753625 Min. :-9.694224 1st Qu.:-0.69230 1st Qu.: -0.680686 1st Qu.:-0.77093 1st Qu.:-0.484832 1st Qu.:-0.753625 1st Qu.:-0.535022 Median : 0.07690 Median : -0.050236 Median : 0.08103 Median : 0.127993 Median :-0.187126 Median : 0.094031 Mean :-0.01912 Mean : 0.007672 Mean :-0.01086 Mean : 0.004137 Mean : 0.001845 Mean : 0.005425 3rd Qu.: 0.69226 3rd Qu.: 0.643260 3rd Qu.: 0.70906 3rd Qu.: 0.646475 3rd Qu.: 0.232864 3rd Qu.: 0.640222 Max. : 1.76915 Max. : 4.299870 Max. : 3.87579 Max. : 4.307299 Max. : 8.125662 Max. :13.955377 V13 V14 V15 V16 V17 V18 Min. :-2.325326 Min. :-1.122704 Min. :-15.78010 Min. :-1.41451 Min. :-2.890895 Min. :-6.48201 1st Qu.:-0.707599 1st Qu.:-0.677653 1st Qu.: 0.10818 1st Qu.:-0.67008 1st Qu.:-0.562810 1st Qu.:-0.65572 Median : 0.022490 Median :-0.249277 Median : 0.29841 Median :-0.24738 Median :-0.068975 Median :-0.01222 Mean : 0.000984 Mean : 0.005968 Mean : -0.01914 Mean :-0.01929 Mean :-0.004446 Mean :-0.04004 3rd Qu.: 0.735969 3rd Qu.: 0.387072 3rd Qu.: 0.38232 3rd Qu.: 0.32839 3rd Qu.: 0.502638 3rd Qu.: 0.59069 Max. : 2.328877 Max. :10.034416 Max. : 1.17948 Max. : 3.66491 Max. : 3.405497 Max. : 3.95314 V19 V20 V21 V22 V23 V24 Min. :-3.4866219 Min. :-53.84720 Min. :-3.872473 Min. :-82.470612 Min. :-0.877362 Min. :-0.9064 1st Qu.:-0.6866883 1st Qu.: -0.57941 1st Qu.:-0.459875 1st Qu.: -0.546812 1st Qu.:-0.556758 1st Qu.:-0.6743 Median : 0.0181297 Median : -0.01640 Median :-0.026090 Median : -0.023271 Median :-0.283361 Median :-0.2101 Mean : 0.0005746 Mean : 0.02152 Mean : 0.001832 Mean : -0.002836 Mean : 0.006677 Mean : 0.0330 3rd Qu.: 0.7036093 3rd Qu.: 0.58834 3rd Qu.: 0.400639 3rd Qu.: 0.501094 3rd Qu.: 0.196238 3rd Qu.: 0.4863 Max. : 3.5553623 Max. : 53.96102 Max. : 5.111946 Max. : 7.022679 Max. :21.385854 Max. :12.3242 V25 V26 V27 V28 V29 V30 Min. :-0.88375 Min. :-1.11709 Min. :-1.00780 Min. :-10.7395 Min. :-1.66934 Min. :-1.0292617 1st Qu.:-0.65752 1st Qu.:-0.71563 1st Qu.:-0.70467 1st Qu.: -0.1804 1st Qu.:-0.46190 1st Qu.:-0.6029130 Median :-0.20505 Median :-0.07946 Median :-0.14171 Median : 0.2798 Median :-0.12636 Median :-0.3733405 Mean : 0.03226 Mean : 0.02066 Mean : 0.01787 Mean : -0.0344 Mean : 0.01104 Mean : 0.0004641 3rd Qu.: 0.47365 3rd Qu.: 0.48877 3rd Qu.: 0.42125 3rd Qu.: 0.5117 3rd Qu.: 0.32533 3rd Qu.: 0.0530082 Max. :10.88045 Max. :11.39008 Max. :11.55056 Max. : 1.2400 Max. :76.74103 Max. : 5.4643580 V31 V32 V33 V34 V35 V36 Min. :-1.72330 Min. :-2.81647 Min. :-1.22587 Min. :-1.33872 Min. :-0.85680 Min. :-1.84229 1st Qu.:-0.95858 1st Qu.:-0.68389 1st Qu.:-0.79860 1st Qu.:-0.85541 1st Qu.:-0.66622 1st Qu.:-0.81453 Median :-0.19386 Median : 0.07774 Median :-0.18821 Median :-0.18663 Median :-0.37654 Median :-0.25103 Mean : 0.01799 Mean :-0.01678 Mean : 0.01022 Mean :-0.07883 Mean :-0.05283 Mean :-0.01440 3rd Qu.: 0.76204 3rd Qu.: 0.68705 3rd Qu.: 0.54426 3rd Qu.: 0.53015 3rd Qu.: 0.25618 3rd Qu.: 0.62855 Max. : 2.86501 Max. : 1.75334 Max. : 4.57282 Max. : 2.78523 Max. : 3.86957 Max. : 5.99709 V37 V38 V39 V40 V41 V42 Min. :-0.457517 Min. :-2.2722 Min. :-1.6455 Min. :-3.477135 Min. :-1.17361 Min. :-5.151515 1st Qu.:-0.457517 1st Qu.:-0.8465 1st Qu.:-0.8011 1st Qu.:-0.687784 1st Qu.:-1.17361 1st Qu.:-0.057516 Median :-0.457517 Median :-0.2618 Median :-0.3438 Median :-0.229916 Median : 0.03988 Median :-0.057516 Mean :-0.001647 Mean :-0.2080 Mean :-0.1453 Mean : 0.007545 Mean : 0.02236 Mean : 0.001137 3rd Qu.:-0.457517 3rd Qu.: 0.3710 3rd Qu.: 0.3013 3rd Qu.: 0.515931 3rd Qu.: 0.49494 3rd Qu.: 0.706584 Max. :15.512632 Max. : 2.1959 Max. : 2.7406 Max. : 3.717934 Max. : 6.15788 Max. : 5.036483 V43 V44 V45 V46 V47 V48 Min. :0.0000 Min. :-0.708214 Min. :-0.5407803 Min. :-0.980665 Min. :-17.332960 Min. :-0.291151 1st Qu.:0.0000 1st Qu.:-0.708214 1st Qu.:-0.5407803 1st Qu.:-0.980665 1st Qu.: -0.684639 1st Qu.:-0.291151 Median :0.0000 Median :-0.286641 Median :-0.2274321 Median :-0.416754 Median : -0.054618 Median :-0.291151 Mean :0.1500 Mean :-0.001202 Mean :-0.0006913 Mean : 0.004792 Mean : 0.007181 Mean : 0.008824 3rd Qu.:0.0000 3rd Qu.: 0.313288 3rd Qu.: 0.1232400 3rd Qu.: 0.711067 3rd Qu.: 0.654157 3rd Qu.:-0.291151 Max. :1.0000 Max. :30.801619 Max. :45.7742768 Max. : 5.786264 Max. : 4.292532 Max. :10.602908
Possibly Parallel Threads
- Help with the Error Message in R "Error in 1:nchid : result would be too long a vector"
- Bug in by() function which works for some FUN argument and does not work for others
- error with summary(vector)??
- Bug in by() function which works for some FUN argument and does not work for others
- Bug in by() function which works for some FUN argument and does not work for others