thr3ads.net - R help - [R] L1 penalized regression fails to predict from model [Jun 2016]

If this information is useful, please help other people find it:
Share via:

Fredrik Karlsson

2016-Jun-20 07:48 UTC

[R] L1 penalized regression fails to predict from model

Dear list,

Sorry for this cross-post from StackOverflow, but I see that SO was maybe
the wrong forum for this question. Too package specific and

Ok, what I am trying to do is to predict from an L1 penalized regression.
This falls due to a data set dimension problem that I cannot figure out.

The procedure I'm using is the following:

require(penalized)# neg contains negative data# pos contains positive data

Now, the procedure below aims to construct comparable (balanced in terms os
positive and negative cases) training and validation data sets.

# 50% negative training set
negSamp <- neg %>% sample_frac(0.5) %>% as.data.frame()# Negative
validation set
negCompl <- neg[setdiff(row.names(neg),row.names(negSamp)),]# 50%
positive training set
posSamp <- pos %>% sample_frac(0.5) %>% as.data.frame()# Positive
validation set
posCompl <- pos[setdiff(row.names(pos),row.names(posSamp)),]# Combine sets
validat <- rbind(negSamp,posSamp)
training <- rbind(negCompl,posCompl)

Ok, so here we now have two comparable sets.

[1] FALSE  TRUE> dim(training)[1] 1061  381> dim(validat)[1] 1060
381> identical(names(training),names(validat))[1] TRUE

I fit the model to the training set without a problem (and I've tried using
a range of Lambda1 values here). But, fitting the model to the validation
data set fails, with a just odd error description.
> fit <-
penalized(VoiceTremor,training[-1],data=training,lambda1=40,standardize=TRUE)#
nonzero coefficients: 13> fit2 <- predict(fit, penalized=validat[-1],
data=validat)Error in .local(object, ...) :
  row counts of "penalized", "unpenalized" and/or
"data" do not match

Just to make sure that this is not due to some NA's in the data set:
> identical(validat,na.omit(validat))[1] TRUE
Oddly enough, I may generate some new data that is comparable to the proper
data set:
>
data.frame(VoiceTremor="NVT",matrix(rnorm(380000),nrow=1000,ncol=380)
) -> neg
>
data.frame(VoiceTremor="VT",matrix(rnorm(380000),nrow=1000,ncol=380) )
-> pos> dim(pos)[1] 1000  381> dim(neg)[1] 1000  381
and run the procedure above, and then the prediction step works!

How come?

What could be wrong with my second (not training) data set?

Fredrik

-- 
"Life is like a trumpet - if you don't put anything into it, you
don't get
anything out of it."

	[[alternative HTML version deleted]]

R help - Jun 2016 - L1 penalized regression fails to predict from model

[R] L1 penalized regression fails to predict from model