thr3ads.net - similar to: "different randomForest performance for same data"

Displaying 20 results from an estimated 200 matches similar to: "different randomForest performance for same data"

2008 Sep 24

rowSums()

Say I have the following data: testDat <- data.frame(A = c(1,NA,3), B = c(NA, NA, 3)) > testDat A B 1 1 NA 2 NA NA 3 3 3 rowsums() with na.rm=TRUE generates the following, which is not desired: > rowSums(testDat[, c('A', 'B')], na.rm=T) [1] 1 0 6 rowsums() with na.rm=F generates the following, which is also not desired: > rowSums(testDat[, c('A',

Replacing rows in a data frame

2011 Jan 17

Replacing rows in a data frame

R-helpers, Below is a simple example of some output that I am getting while trying to work with a data frame in R 2.12.1 for Mac. ----- > testdat <- data.frame(matrix(ncol=10, nrow=10)) > colnames(testdat) <- c('a','b','c','d','e','f','g','h','i','j') > testdat[seq(1,10,3),] <-

dimnames in pkg "ipred"

2009 Jan 22

dimnames in pkg "ipred"

Hello List, I`m trying to make prediction using a bagged tree with the package ipred. I tried to follow the manual but I`m getting an error message. Also browsing through the list-archive I didn`t find any hint. Maybe someone can help me? selbag <- bagging(SOIL_UNIT ~., data=traindat.bin, coob=TRUE) Error in dimnames(X) <- list(dn[[1L]], unlist(collabs, use.names = FALSE)) :

resampling issue

2010 Sep 29

resampling issue

I am trying to get R to resample my dataset of two columns of age and length data for fish. I got it to work, but it is not resampling every replicate. Instead, it resamples my data once and then repeated it 5 times. Here is my dataset of 9 fish samples with an age and length for each one: Age Length 2 200 5 450 6 600 7 702 8 798 5 453 4 399 1 120 2 202 Here is my code which resamples my

Column order in stacking/unstacking

2011 Mar 12

Column order in stacking/unstacking

Dear R users, I'm having some problems with the stack() and unstack() functions, and wondered if you could help. I have a large data frame (400 rows x 2000 columns), which I need to reduce to a single column of values (and therefore 800000 rows), so that I can use it in other operations (e.g., generating predictions from a GLM object). However, the problem I'm having can be reproduced

repeat a function

2010 Sep 29

repeat a function

I have R randomly sampling my array made up of 2 columns of data. Here is my code randomly sampling 5 different rows from my dataset to create a new dataset of 8 rows of data: testdat<-growth[sample(5,8,replace=T),] Now I want to tell R to repeat this function 50 times and give me the output. I have been searching the internet and have been unable to figure this out. Any advice

expected behavior when parsing lines with special characters

2011 Feb 15

expected behavior when parsing lines with special characters

Say I have a tab-delimited table I want to read into R. What should I expect to happen if some of the entries contain the character " ' "? I thought it would read the file fine, but that is not what happens. Instead, all the values in between two " ' "s get read into one field, and things are just seriously messed up. Is this a bug, and besides removing the offending

problem with predict(mboost,...)

2010 Oct 20

problem with predict(mboost,...)

Hi, I use a mboost model to predict my dependent variable on new data. I get the following warning message: In bs(mf[[i]], knots = args$knots[[i]]$knots, degree = args$degree, : some 'x' values beyond boundary knots may cause ill-conditioned bases The new predicted values are partly negative although the variable in the training data ranges from 3 to 8 on a numeric scale. In order to

Different x-axis scales using c() in latticeExtra

2013 Jul 20

Different x-axis scales using c() in latticeExtra

Hi, I would like to combine multiple xyplots into a single, multipanel display. Using R 3.0.1 in Ubuntu, I have used c() from latticeExtra to combine three plots, but the x-axis for two plots are on a log scale and the other is on a normal scale. I also have included equispace.log=FALSE to clean up the tick labels. However, when I try all of these, the x-axis scale of the first panel is used

predict()

2011 Apr 13

predict()

Hi, I am experimenting with the function predict() in two versions of R and the R extension package "survival". library(survival) set.seed(123) testdat=data.frame(otime=rexp(10),event=rep(0:1,each=5),x=rnorm(10)) testfm=as.formula('Surv(otime,event)~x') testfun=function(dat,fm) { predict(coxph(fm,data=dat),type='lp',newdata=dat) } # Under R 2.11.1 and

predict()

2011 Apr 13

predict()

degenerate cases in RPART

2002 Nov 26

degenerate cases in RPART

RPART doesn't seem to handle the degenerate case when all training samples are drawn from a single class: > TrainType [1] 0 0 0 0 > TrainDat V1 V2 V3 V4 V5 1 0.6434392 0.5105860 0.3048803 0.3161728 0.5449632 2 0.1710005 0.5973921 0.1267061 0.6146834 0.7299928 3 0.6919125 0.8880789 0.9123243 0.9061885 0.9553663 4 0.3094843 0.6475508

rcart - classification and regression trees (CART)

2009 Dec 16

rcart - classification and regression trees (CART)

Hi, I am trying to use CART to find an ideal cut-off value for a simple diagnostic test (ie when the test score is above x, diagnose the condition). When I put in the model fit=rpart(outcome ~ predictor1(TB144), method="class", data=data8) sometimes it gives me a tree with multiple nodes for the same predictor (see below for example of tree with 1 or multiple nodes). Is there a way

Predict using SparseM.slm

2007 Aug 01

Predict using SparseM.slm

Hi, I am trying out the SparseM package and had the a question. The following piece of code works fine: ... fit = slm(model, data = trainData, weights = weight) ... But how do I use the fit object to predict the values on say a reserved testDataSet? In the regular lm function I would do something like this: predict.lm(fit,testDataSet) Thanks -Bala

Logistic regression goodness of fit tests

2005 Mar 10

Logistic regression goodness of fit tests

I was unsure of what suitable goodness-of-fit tests existed in R for logistic regression. After searching the R-help archive I found that using the Design models and resid, could be used to calculate this as follows: d <- datadist(mydataframe) options(datadist = 'd') fit <- lrm(response ~ predictor1 + predictor2..., data=mydataframe, x =T, y=T) resid(fit, 'gof'). I set up a

strategy to iterate over repeated measures/longitudinal data

2009 Jul 15

strategy to iterate over repeated measures/longitudinal data

Hi Group, Create some example data. set.seed(1) wide_data <- data.frame( id=c(1:10), predictor1 = sample(c("a","b"),10,replace=TRUE), predictor2 = sample(c("a","b"),10,replace=TRUE), predictor3 = sample(c("a","b"),10,replace=TRUE), measurement1=rnorm(10), measurement2=rnorm(10)) head(wide_data) id

Follow-up on nls convergence failure with SSfol

2005 Jan 24

Follow-up on nls convergence failure with SSfol

A couple of weeks ago there was a question regarding apparent convergence in nls when using the SSfol selfStart model for fitting a first-order pharmacokinetic model. I can't manage to find the original message either in my archive or in the list archives but the data were time conc dose 0.50 5.40 1 0.75 11.10 1 1.00 8.40 1 1.25 13.80 1 1.50 15.50 1

Get all X iterations in optim output when controls(trace=6)

2008 May 30

Get all X iterations in optim output when controls(trace=6)

Hi, I would like to get all X iterations in optim output in matrix form. I know about the follow approach: sink("reportOptim") optim( ......., control=list( trace=6,..........) ) sink() all_iterOptim <- readLines("reportOptim") unlink("reportOptim") all_iterOptim <- all_iterOptim[ grep( '^X', all_iterOptim ) ] ### TODO: the rest !!! :-) But it is very

MODE , VARIANCE , NTH PERCENTAILE

2012 Jul 11

MODE , VARIANCE , NTH PERCENTAILE

Hi, Here i have an matrix like this, ABC PQR XYZ MNO ------ ------- ------ -------- 3 6 7 15 2 12 24 15 20 5 1 2 25 50 15 35 i need to get the "MODE" - for each column-wise "VARIANCE" - for

GBM package: Extract coefficients

2009 Dec 14

GBM package: Extract coefficients

I am using the gbm package for generalized boosted regression models, and would like to be able to extract the coefficients produced for storage in a database. I am already using R to automatically generate formulas that I can export to a database and store. For example, I have been using Dr. Harrell's lrm package to perform logistic regression, e.g.: output <-

similar to: different randomForest performance for same data