Hi,
I am writing to seek some guidance regarding using Lasso regression with the
R package LARS. I have introductory statistics background but I am trying to
learn more. Right now I am trying to duplicate the results in a paper for
shRNA prediction "An accurate and interpretable model for siRNA efficacy
prediction, Jean-Philippe Vert et. al, Bioinformatics" for a Bioinformatics
project that we are working on. I know that the authors of the paper are
using Lasso regression and so far looking at their paper this is what I have
gotten to.
xtrain <- trainData
> dim(trainData)
[1] 18520 88
ytrain <- trainScore
length(ytrain)
[1] 18520
nfolds <- 100
epsilon <- exp(-10)
# code from JP Vert
object1 <- cv.lars(xtrain,ytrain, K=nfolds, fraction = seq(from = 0, to = 1
, length= 1000), type='lasso', eps=epsilon, plot.it=TRUE)
bestfraction <- object1$fraction[min(which(object1$cv <= min(object1$cv)+
0.01*(max(object1$cv)-min(object1$cv))))]
bestcoef <- coef(object1,s=bestfraction,mode='fraction') # this gives
bestcoef as NULL, I don't know why.
# End code from Jp Vert
# Code by me:: Not so sure about the authenticity of this--
predictor <- lars(trainData,trainScore, type='lasso',
eps=epsilon,trace=TRUE
)
reslt<-predict.lars(predictor,trainData,s=bestfraction,mode="lambda",type"coefficients")
My aim is to extract the coefficients of the m variables from the regression
for the best case or best fraction
I am a bit confused with the above two lines of code that I have figured out
from the LARS manual. Here is where I have trouble following:
1) cv.lars() returns a list. How can I get from that to a lars object which
I can then use in the predict.lars() function with the "bestfraction"?
2) In the above code, I have used the "s=bestfraction" option in the
predict.lars() function, but when I called lars() function, it only used 69
iterations where as in cv.lars() there are 1000 fractions. How do I tell
lars() to do 1000 fractions like in cv.lars() and pick the best one. Again I
could be completely wrong here because of my lack of understanding of LARS.
Sorry :(
I apologize in advance for my ignorance about the statistics in this. I am
reading up on the paper on LARS but it will take me sometime to figure this
out based on that and so I am seeking out your help. Thank you so much in
anticipation of your reply.
Sincerely,
Vishal
### Summary of objects generated in R ###> summary(reslt)
Length Class Mode
s 1 -none- numeric
fraction 1 -none- numeric
mode 1 -none- character
coefficients 88 -none- numeric>
> summary(object1)
Length Class Mode
fraction 1000 -none- numeric
cv 1000 -none- numeric
cv.error 1000 -none- numeric>
> bestfraction
[1] 0.7687688
> summary(predictor)
LARS/LASSO
Call: lars(x = trainData, y = trainScore, type = "lasso", trace =
TRUE,
Call: eps = epsilon)
Df Rss Cp
0 1 2139108 849.989
1 2 2113344 618.713
2 3 2108060 572.873
3 4 2107447 569.322
4 5 2106816 565.603
5 6 2099451 500.922
6 7 2098693 496.061
7 8 2098506 496.366
8 9 2096066 476.274
9 10 2095918 476.929
10 11 2092055 443.957
11 12 2089956 426.950
12 13 2085501 388.615
13 14 2084952 385.642
14 15 2084392 382.573
15 16 2081925 362.239
16 17 2080698 353.128
17 18 2080401 352.438
18 19 2079909 349.988
19 20 2078895 342.801
20 21 2077178 329.261
21 22 2076617 326.181
22 23 2076388 326.108
23 24 2072939 296.872
24 25 2072099 291.270
25 26 2071479 287.659
26 27 2070436 280.211
27 28 2069626 274.876
28 29 2069571 276.384
29 30 2068567 269.293
30 31 2068424 269.994
31 32 2063186 224.574
32 33 2063144 226.192
33 34 2062767 224.774
34 35 2061369 214.117
35 36 2060113 204.742
36 37 2059941 205.190
37 38 2058845 197.266
38 39 2056762 180.409
39 40 2054715 163.869
40 39 2054413 159.141
41 40 2052346 142.426
42 41 2052231 143.384
43 42 2051759 141.107
44 43 2051520 140.945
45 44 2051438 142.197
46 45 2051072 140.890
47 46 2049811 131.467
48 47 2049294 128.787
49 48 2048110 120.069
50 49 2045617 99.494
51 50 2044817 94.255
52 51 2044323 91.780
53 52 2044285 93.434
54 53 2043975 92.625
55 54 2043726 92.373
56 55 2042600 84.175
57 56 2042513 85.395
58 57 2040895 72.745
59 58 2040614 72.196
60 59 2040234 70.755
61 60 2039718 68.081
62 61 2039042 63.966
63 62 2039037 65.919
64 63 2038989 67.481
65 64 2038891 68.594
66 65 2038460 66.692
67 66 2038198 66.325
68 67 2038052 67.000
*
*
[[alternative HTML version deleted]]