thr3ads.net - similar to: "How to properly build model matrices"

Displaying 20 results from an estimated 8000 matches similar to: "How to properly build model matrices"

2012 Feb 10

Choosing glmnet lambda values via caret

Usually when using raw glmnet I let the implementation choose the lambdas. However when training via caret::train the lambda values are predetermined. Is there any way to have caret defer the lambda choices to caret::train and thus choose the optimal lambda dynamically? -- Yang Zhang http://yz.mit.edu/

Custom caret metric based on prob-predictions/rankings

2012 Feb 10

Custom caret metric based on prob-predictions/rankings

I'm dealing with classification problems, and I'm trying to specify a custom scoring metric (recall at p, ROC, etc.) that depends on not just the class output but the probability estimates, so that caret::train can choose the optimal tuning parameters based on this metric. However, when I supply a trainControl summaryFunction, the data given to it contains only class predictions, so the

Survival statistics--displaying multiple plots

2007 May 03

Survival statistics--displaying multiple plots

Hello all! I am once again analyzing patient survival data with chronic liver disease. The severity of the liver disease is given by a number which is continuously variable. I have referred to this number as "meld"--model for end stage liver disease--which is the result of a mathematical calculation on underlying laboratory values. So, for example, I can generate a Kaplan-Meier plot

caret train and trainControl

2012 Nov 23

caret train and trainControl

I am used to packages like e1071 where you have a tune step and then pass your tunings to train. It seems with caret, tuning and training are both handled by train. I am using train and trainControl to find my hyper parameters like so: MyTrainControl=trainControl( method = "cv", number=5, returnResamp = "all", classProbs = TRUE ) rbfSVM <- train(label~., data =

CARET: Any way to access other tuning parameters?

2013 Feb 13

CARET: Any way to access other tuning parameters?

The documentation for caret::train shows a list of parameters that one can tune for each method classification/regression method. For example, for the method randomForest one can tune mtry in the call to train. But the function call to train random forests in the original package has many other parameters, e.g. sampsize, maxnodes, etc. Is there **any** way to access these parameters using train

Caret train with glmnet give me Error "arguments imply differing number of rows"

2013 Jun 11

Caret train with glmnet give me Error "arguments imply differing number of rows"

Hello, I'm training a set of data with Caret package using an elastic net (glmnet). Most of the time train works ok, but when the data set grows in size I get the following error: Error en { : task 1 failed - "arguments imply differing number of rows: 9, 10" and several warnings like this one: 1: In eval(expr, envir, enclos) : model fit failed for Resample01 My call to train

use "caret" to rank predictors by random forest model

2011 Mar 07

use "caret" to rank predictors by random forest model

Hi, I'm using package "caret" to rank predictors using random forest model and draw predictors importance plot. I used below commands: rf.fit<-randomForest(x,y,ntree=500,importance=TRUE) ## "x" is matrix whose columns are predictors, "y" is a binary resonse vector ## Then I got the ranked predictors by ranking

kriging problem(?)

2008 Jul 04

kriging problem(?)

Hei, I have two spatial datasets Sa and Sb, both with lat-lon coordinates and from same geographic area, but from different localities within the area (independent samples). Sa is biotoc data, Sb is some environmental parameter (fertility). I 'know' that Sb affects Sa, but wonder on which scale. I tried different interpolations by creating different grids of Sb (e.g. 20x20 and 100x100

caret package: arguments passed to the classification or regression routine

2008 Sep 18

caret package: arguments passed to the classification or regression routine

Hi, I am having problems passing arguments to method="gbm" using the train() function. I would like to train gbm using the laplace distribution or the quantile distribution. here is the code I used and the error: gbm.test <- train(x.enet, y.matrix[,7], method="gbm", distribution=list(name="quantile",alpha=0.5), verbose=FALSE,

CV en R

2017 Jun 04

CV en R

H2O va bien (muy bien) tanto en un ordenador sobremesa/portátil y sobre un clúster. En uno de sobremesa si tienes buena RAM y muchos cores, mejor. Y no tienes porqué usar Spark si no necesitas una solución tiempo real o "near real-time". H2O tiene otra solución para interaccionar con Spark (Sparkling Water). Incluso sobre un clúster, puedes usar "sparklyr" y

Inconsistent results between caret+kernlab versions

2013 Nov 15

Inconsistent results between caret+kernlab versions

I'm using caret to assess classifier performance (and it's great!). However, I've found that my results differ between R2.* and R3.* - reported accuracies are reduced dramatically. I suspect that a code change to kernlab ksvm may be responsible (see version 5.16-24 here: http://cran.r-project.org/web/packages/caret/news.html). I get very different results between caret_5.15-61 +

Train error:: subscript out of bonds

2011 Jan 24

Train error:: subscript out of bonds

Hi, I am trying to construct a svmpoly model using the "caret" package (please see code below). Using the same data, without changing any setting, I am just changing the seed value. Sometimes it constructs the model successfully, and sometimes I get an ?Error in indexes[[j]] : subscript out of bounds?. For example when I set seed to 357 following code produced result only for 8

Random Seed Location

2018 Feb 26

Random Seed Location

Hi all, For some odd reason when running na?ve bayes, k-NN, etc., I get slightly different results (e.g., error rates, classification probabilities) from run to run even though I am using the same random seed. Nothing else (input-wise) is changing, but my results are somewhat different from run to run. The only randomness should be in the partitioning, and I have set the seed before this

Help with this error "kernlab class probability calculations failed; returning NAs"

2012 Nov 29

Help with this error "kernlab class probability calculations failed; returning NAs"

I have never been able to get class probabilities to work and I am relatively new to using these tools, and I am looking for some insight as to what may be wrong. I am using caret with kernlab/ksvm. I will simplify my problem to a basic data set which produces the same problem. I have read the caret vignettes as well as documentation for ?train. I appreciate any direction you can give. I

Random Seed Location

2018 Feb 27

Random Seed Location

In case you don't get an answer from someone more knowledgeable: 1. I don't know. 2. But it is possible that other packages that are loaded after set.seed() fool with the RNG. 3. So I would call set.seed just before you invoke each random number generation to be safe. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking

NaiveBayes fails with one input variable (caret and klarR packages)

2009 Jun 30

NaiveBayes fails with one input variable (caret and klarR packages)

Hello, We have a system which creates thousands of regression/classification models and in cases where we have only one input variable NaiveBayes throws an error. Maybe I am mistaken and I shouldn't expect to have a model with only one input variable. We use R version 2.6.0 (2007-10-03). We use caret (v4.1.19), but have tested similar code with klaR (v.0.5.8), because caret relies on

caret package: custom summary function in trainControl doesn't work with oob?

2012 Apr 13

caret package: custom summary function in trainControl doesn't work with oob?

Hi all, I've been using a custom summary function to optimise regression model methods using the caret package. This has worked smoothly. I've been using the default bootstrapping resampling method. For bagging models (specifically randomForest in this case) caret can, in theory, uses the out-of-bag (oob) error estimate from the model instead of resampling, which (in theory) is largely

caret: Error when using rpart and CV != LOOCV

2012 May 15

caret: Error when using rpart and CV != LOOCV

Hy, I got the following problem when trying to build a rpart model and using everything but LOOCV. Originally, I wanted to used k-fold partitioning, but every partitioning except LOOCV throws the following warning: ---- Warning message: In nominalTrainWorkflow(dat = trainData, info = trainInfo, method = method, : There were missing values in resampled performance measures. ----- Below are some

Random Seed Location

2018 Mar 04

Random Seed Location

On Mon, Feb 26, 2018 at 3:25 PM, Gary Black <gwblack001 at sbcglobal.net> wrote: (Sorry to be a bit slow responding.) You have not supplied a complete example, which would be good in this case because what you are suggesting could be a serious bug in R or a package. Serious journals require reproducibility these days. For example, JSS is very clear on this point. To your question >

ssh -Y X-forwarding?

2013 Jun 04

ssh -Y X-forwarding?

On rare occasions I want to run a remote X command (like 'meld' to interactively merge changes in files) and normally 'ssh -Y remote_host' from a terminal in an NX/freenx window that is acting as my desktop to start and any X program subsequently started would open in a new window via X-forwarding - at least when the target is a 5.x host. I don't do it often enough to remember

similar to: How to properly build model matrices