Displaying 20 results from an estimated 10000 matches similar to: "Performance measure for probabilistic predictions"
2009 Sep 07
2
Confused - better empirical results with error in data
Hi,
I have a strange one for the group.
We have a system that predicts probabilities using a fairly standard svm
(e1017). We are looking at probabilities of a binary outcome.
The input data is generated by a perl script that calculates a bunch of
things, fetches data from a database, etc.
We train the system on 30,000 examples and then test the system on an
unseen set of 5,000 records.
2009 Aug 25
1
Clogit or LRM?
Hello
I believe that I'm getting very close in my modeling application.
I've come across a challenge that I am unable to solve and would really
appreciate the group's opinion.
I've been using the val.prob function from the Design library (Thanks
Frank!!) to both evaluate and visualize my model.
From the scores and graph, it appears as my model is very accurate in
2009 Aug 04
1
Build a dataframe row by row?
Hi,
Time for another of my "newbie" questions.
Is it possible to build up a data.frame "row by row" as I go
I'm going to be running a bunch of experiments (many in a loop) to test
different things. I'm using AUC as my main performance measure.
My thought was to add a row to a data.frame for each iteration and then
have a nice summary report at the end.
I found
2009 Aug 20
1
Calculating loess value
Hello,
I'm attempting to evaluate the accuracy of the probability predictions
for my model. As previously discussed here, the AUC is not a good
measure as I'm not concerned with classification accuracy but
probability accurcy.
It was suggested to me that the loess function would be a good measure
to look at.
I can see some libraries (Design) will plot the loess function as a
curve
2009 Aug 21
1
Question about validating predicted probabilities
Hello,
Frank was nice enough to point me to the val.prob function of the Design
library.
It creates a beautiful graph that really helps me visualize how well my
model is predicting probabilities.
By default, there are two lines on the graph
1) fitted logistic calibration curve
2) nonparametric fit using lowess
Right now, the nonparametric line doesn't look very good.
The
2009 Aug 04
1
Save model and predictions from svm
Hello,
I'm using the e1071 package for training an SVM. It seems to be working
well.
This question has two parts:
1) Once I've trained an SVM model, I want to USE it within R at a later
date to predict various new data. I see the write.svm command, but
don't know how to LOAD the model back in so that I can use it tomorrow.
How can I do this?
2) I would like to add the
2008 Jun 12
1
About Mcneil Hanley test for a portion of AUC!
Dear all
I am trying to compare the performances of several methods using the AUC0.1
and
not the whole AUC. (meaning I wanted to compare to AUC's whose x axis only
goes to
0.1 not 1)
I came to know about the Mcneil Hanley test from Bernardo Rangel Tura
and I referred to the original paper for the calculation of "r" which is an
argument of the function
cROC. I can only find the
2012 Jun 06
1
Data scientist // Berlin-based startup using probabilistic models in ecommerce
*Fluidshopping is a Berlin-based startup working on a customer analytics
tool for online retailers.
Customer Lifefitime Value (CLV) is the mythical 'magic number', the amount
of money a particular customer will ever bring in. Knowing your CLV makes
it trivial to:
- optimize marketing spend for different inbound channels.
- identify your highest value customers,
- identify those in danger
2008 Jul 17
1
Comparing differences in AUC from 2 different models
Hi,
I would like to compare differences in AUC from 2 different models, glm and gam for predicting presence / absence. I know that in theory the model with a higher AUC is better, but what I am interested in is if statistically the increase in AUC from the glm model to the gam model is significant. I also read quite extensive discussions on the list about ROC and AUC but I still didn't find
2009 Aug 30
1
SVM coefficients
Hello,
I'm using the svm function from the e1071 package.
It works well and gives me nice results.
I'm very curious to see the actual coefficients calculated for each
input variable. (Other packages, like RapidMiner, show you this
automatically.)
I've tried looking at attributes for the model and do see a
"coefficients" item, but printing it returns an NULL result.
2008 Jan 05
1
AUC values from LRM and ROCR
Dear List,
I am trying to assess the prediction accuracy of an ordinal model fit with
LRM in the Design package. I used predict.lrm to predict on an independent
dataset and am now attempting to assess the accuracy of these predictions.
>From what I have read, the AUC is good for this because it is threshold
independent. I obtained the AUC for the fit model output from the c score (c
=
2009 Aug 02
2
Strange column shifting with read.table
Hi,
I am reading in a dataframe from a CSV file. It has 70 columns. I do
not have any kind of unique "row id".
rawdata <- read.table("r_work/train_data.csv", header=T, sep=",",
na.strings=0)
When training an svm, I keep getting an error
So, as an experiment, I wrote the data back out to a new file so that I
could see what the svm function sees.
2011 Jan 07
2
Stepwise SVM Variable selection
I have a data set with about 30,000 training cases and 103 variable.
I've trained an SVM (using the e1071 package) for a binary classifier
{0,1}. The accuracy isn't great.
I used a grid search over the C and G parameters with an RBF kernel to
find the best settings.
I remember that for least squares, R has a nice stepwise function that
will try combining subsets of variables to find
2008 Dec 04
2
Logistic Regression: variable selection based on p value?
Hi,
When I use logistic regression, each variable has a p value associated with
it. Do I only include the variables that have a statistically significant p
value (<0.05), or are there situations when I should include variables when
their p values are high? I had heard that if a variable has a high p value
but it's not the terminal variable, keep it; otherwise, take it out. Not
sure if
2010 Apr 29
2
can not print probabilities in svm of e1071
> x <- train[,c( 2:18, 20:21, 24, 27:31)]
> y <- train$out
>
> svm.pr <- svm(x, y, probability = TRUE, method="C-classification",
kernel="radial", cost=bestc, gamma=bestg, cross=10)
>
> pred <- predict(svm.pr, valid[,c( 2:18, 20:21, 24, 27:31)],
decision.values = TRUE, probability = TRUE)
> attr(pred, "decision.values")[1:4,]
2009 Sep 22
2
Pull Coefficients from MCMCpack models
Hi,
I've been testing some models with the MCMCpack library.
I can run the process and get a nice model "object". I can easily see
the summary and even plot it.
I can't seem to figure out how to:
1) Access the final coefficients in the model
2) Turn the coefficients into a model so I can then run predictions
using them.
A summary command will SHOW Me the coefficients, but
2006 Nov 24
1
How to find AUC in SVM (kernlab package)
Dear all,
I was wondering if someone can help me. I am learning SVM for
classification in my research with kernlab package. I want to know about
classification performance using Area Under Curve (AUC). I know ROCR
package can do this job but I found all example in ROCR package have
include prediction, for example, ROCR.hiv {ROCR}. My problem is how to
produce prediction in SVM and to find
2011 Feb 21
3
ROC from R-SVM?
*Hi,
*Does anyone know how can I show an *ROC curve for R-SVM*? I understand in
R-SVM we are not optimizing over SVM cost parameter. Any example ROC for
R-SVM code or guidance can be really useful.
Thanks, Angel.
[[alternative HTML version deleted]]
2010 Jun 01
4
Plot multiple columns
I'm running a long MCMC chain that is generating samples for 22 variables.
I have each run of the chain as a row in a matrix.
So: Chain[,1] is the column with all the samples for variable one.
Chain[,2] is the column with all the samples for variable 2, etc.
I'd like to fit all 22 on a single page to print a nice summary. It is
OK if the graphs are small, I just need to show the
2007 Jan 24
2
Logistic regression model + precision/recall
Hi,
I am using logistic regression model named lrm(Design)
Rite now I was using Area Under Curve (AUC) for testing my model. But, now I
have to calculate precision/recall of the model on test cases.
For lrm, precision and recal would be simply defined with the help of 2
terms below:
True Positive (TP) - Number of test cases where class 1 is given probability
>= 0.5.
False Negative (FP) -