thr3ads.net - similar to: "Using predict.glm for classification"

Displaying 20 results from an estimated 6000 matches similar to: "Using predict.glm for classification"

Finding points with equal probability between normal distributions

2006 Aug 07

Finding points with equal probability between normal distributions

Dear mailing list, For two normal distributions, e.g: r1 =rnorm(20,5.2,2.1) r2 =rnorm(20,4.2,1.1) plot(density(r2), col="blue") lines(density(r1), col="red") Is there a way in R to compute/estimate the point(s) x where the density of the two distributions cross (ie where x has equal probability of belonging to either of the two distributions)? Many Thanks Eleni

memory problems when combining randomForests [Broadcast]

2006 Jul 27

memory problems when combining randomForests [Broadcast]

You need to give us more details, like how you call randomForest, versions of the package and R itself, etc. Also, see if this helps you: http://finzi.psych.upenn.edu/R/Rhelp02a/archive/32918.html Andy From: Eleni Rapsomaniki > > Dear all, > > I am trying to train a randomForest using all my control data > (12,000 cases, ~ 20 explanatory variables, 2 classes). > Because

Multiple imputation using mice with "mean"

2006 Sep 25

Multiple imputation using mice with "mean"

Hi I am trying to impute missing values for my data.frame. As I intend to use the complete data for prediction I am currently measuring the success of an imputation method by its resulting classification error in my training data. I have tried several approaches to replace missing values: - mean/median substitution - substitution by a value selected from the observed values of a variable - MLE

convering upper triangular matrix into vector

2005 Oct 02

convering upper triangular matrix into vector

Hi I have two symmetrical distance matrices and want to compute the correlation coefficient between them (after turning them into vectors). Is there a way of selecting only the upper triangular part of each matrix, then convert this into a vector so I can compute the correlation? Many Thanks Eleni Rapsomaniki

summary(glm) for categorical variables

2006 Sep 11

summary(glm) for categorical variables

Dear list people Suppose we have a data.frame where variables are categorical and the response is categorical eg: my.df=NULL for(i in LETTERS[1:3]){my.df[[i]]=sample(letters, size=10)} my.df=data.frame(my.df) my.df$class=factor(rep(c("pos", "neg"), times=5)) my.glm=glm(class ~ ., data=my.df, family=binomial) summary(my.glm) .... Estimate Std. Error z

Any hot-deck imputation packages?

2006 Sep 27

Any hot-deck imputation packages?

Hi I found on google that there is an implementation of hot-deck imputation in SAS: http://ideas.repec.org/c/boc/bocode/s366901.html Is there anything similar in R? Many Thanks Eleni Rapsomaniki

RandomForest vs. bayes & svm classification performance

2006 Jul 24

RandomForest vs. bayes & svm classification performance

Hi This is a question regarding classification performance using different methods. So far I've tried NaiveBayes (klaR package), svm (e1071) package and randomForest (randomForest). What has puzzled me is that randomForest seems to perform far better (32% classification error) than svm and NaiveBayes, which have similar classification errors (45%, 48% respectively). A similar difference in

how to combine imputed data-sets from mice for classfication

2006 Oct 30

how to combine imputed data-sets from mice for classfication

Dear R users I want to combine multiply imputed data-sets generated from mice to do classfication. However, I have various questions regarding the use of mice library. For example suppose I want to predict the class in this data.frame: data(nhanes) mydf=nhanes mydf$class="pos" mydf$class[sample(1:nrow(mydf), size=0.5*nrow(mydf))]="neg" mydf$class=factor(mydf$class) First I

R: Grouping columns in a data frame based on the values of a column

2006 Sep 15

R: Grouping columns in a data frame based on the values of a column

Perhaps using 'ave' and 'cut': df <- data.frame(x=runif(100, 0.1, 1), y=rnorm(100, 0.2, 0.6)) df$xcut<-cut(df$x, seq(0, 1, 0.1)) df$z<-ave(df$y, df$xcut) df[order(df$x),] Stefano -----Messaggio originale----- Da: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch]Per conto di e.rapsomaniki at mail.cryst.bbk.ac.uk Inviato: venerd? 15

Who uses R?

2007 Sep 25

Who uses R?

Dear R users, I have started work in a Statistics government department and I am trying to convince my bosses to install R on our computers (I can't do proper stats in Excel!!). They asked me to prove that this is a widely used software (and not just another free-source, bug infected toy I found on the web!) by suggesting other big organisations that use it. Are you aware of any reputable

Grouping columns in a data frame based on the values of a column

2006 Sep 15

Grouping columns in a data frame based on the values of a column

Dear R users, This is a trivial question, there might even be an R function for it, but I have to do it many times and wonder if there is an efficient for it. Suppose we have a data frame like this: d <- data.frame(x=sample(seq(0.1:1, by=0.01), size=100, replace=TRUE), y=rnorm(100, 0.2, 0.6)) and want to have the average of y for a given interval of x, for example mean(y)[0>x>0.1]. Is

for loops and counter interpolation

2006 May 17

for loops and counter interpolation

Hi I'm sorry about the triviality of my problem. I have a vector (v) of three columns (logA, logB, id). I want to compute (and plot) the correlation between logA and logB for different thresholds of id (e.g. >30, etc). So I tried: for(i in 1:100){ points(cor(v$logA[v$id>i], v$logB[v$id>i], use="complete.obs"), i)) } (i created a plot object already) but it comes with

See source code for survplot function in Design package

2009 Feb 05

See source code for survplot function in Design package

Dear R users, I know one way to see the code for a hidden function, say function_x, is using default.function_x (e.g. summary.default). But how can I see the code for imported packages that have no namespace (in this case Design)? Many Thanks Eleni

correlation between rows of data.frame

2008 Aug 01

correlation between rows of data.frame

Dear R users, I need to come up with an efficient method to compute the correlation (or at least, the euclidean distance if that's easier) between specific rows in a data frame (46,232 rows, 29 columns). The pairs of rows between which I want to find the correlation share a common value in one of the columns. So for example, in the following

How do i compute predicted failure time from a cox model?

2009 Feb 16

How do i compute predicted failure time from a cox model?

Given a cox model: library(Hmisc); library(survival); (library(Design); cox.model=cph(Surv(futime, fustat) ~ age, data=ovarian, surv=T) str(cox.model) What I need is the total estimated time until failure (death), not the probability of failing at a given time (survival probability), or hazard etc, which is what I get from survest and predict for example. I suspect the answer is

Competing risks adjusted for covariates

2009 Feb 27

Competing risks adjusted for covariates

Dear R-users Has anybody implemented a function/package that will compute an individual's risk of an event in the presence of competing risks, adjusted for the individual's covariates? The only thing that seems to come close is the cuminc function from cmprsk package, but I would like to adjust for more than one covariate (it allows you to stratify by a single grouping vector). Any

update.formula drop interaction terms

2009 Oct 13

update.formula drop interaction terms

Dear R users, How do I drop multiplication terms from a formula using update? e.g. forml=as.formula("Surv(time, status) ~ x1+x2+A*x3+A*x4+B*x5+strata(sex)") #I would like to drop all instances of variable A (the main effect and its interactions). The following: updated.forml=update(forml, ~ . -A) #gives me this: #Surv(time, status) ~ x1 + x2 + x3 + x4 + B + x5 + strata(sex) + A:x3 +

Joint confidence interval for fractional polynomial terms

2012 Jan 09

Joint confidence interval for fractional polynomial terms

Dear R users, The package 'mfp' that fits fractional polynomial terms to predictors. Example: data(GBSG) f <- mfp(Surv(rfst, cens) ~ fp(age, df = 4, select = 0.05) + fp(prm, df = 4, select = 0.05), family = cox, data = GBSG) print(f) To describe the association between the original predictor, eg. age and risk for different values of age I can plot it the polynomials

Calibration score for survival probability

2009 Nov 23

Calibration score for survival probability

Good afternoon! I need to evaluate the goodness-of-fit (aka calibration) for survival probability estimates from a Cox model. I tried to use 'calibrate' in the Design package but I'm not sure if it should/would produce what I need (ie a chi-sq type statistic with a table of expected vs observed probabilities). Any other functions I should be aware of? Also, has anybody come across

survfit using quantiles to group age

2009 Feb 02

survfit using quantiles to group age

I am using the package Design for survival analysis. I want to plot a simple Kaplan-Meier fit of survival vs. age, with age grouped as quantiles. I can do this: survplot(survfit(Surv(time,status) ~ cut(age,3), data=veteran) but I would like to do something like this: survplot(survfit(Surv(time,status) ~ quantile(age,3), data=veteran) #will not work ideally I would like to superimpose

similar to: Using predict.glm for classification