thr3ads.net - similar to: "partitioning data"

Displaying 20 results from an estimated 20000 matches similar to: "partitioning data"

2007 Oct 16

partitioning data [SEC=UNCLASSIFIED]

Hi Stephen, Check the help for predict.glm(). The argument for passing new data is actually 'newdata', as in: > pred = predict(glm.model, newdata=form[150001:200000,-1], > type="response") Cheers Joe -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of stephenc at ics.mq.edu.au Sent: Tuesday, 16

problems with glm

2007 Oct 02

problems with glm

I am having a couple of problems someone may be able to cast some light on. Question 1: I am making a logistic model but when i do this: glm.model = glm(as.factor(form$finished) ~ ., family=binomial, data=form[1:150000,]) I get this: Error in model.frame(formula, rownames, variables, varnames, extras, extranames, : variable lengths differ (found for 'barrier') which is

boosting - second posting

2006 May 27

boosting - second posting

Hi I am using boosting for a classification and prediction problem. For some reason it is giving me an outcome that doesn't fall between 0 and 1 for the predictions. I have tried type="response" but it made no difference. Can anyone see what I am doing wrong? Screen output shown below: > boost.model <- gbm(as.factor(train$simNuance) ~ ., # formula +

Band-wise Sum

2010 Aug 27

Band-wise Sum

Hi I have a large credit portfolio (exceeding 50000 borrowers). For particular process I need to add up the exposures based on the bands. I am giving a small test data below. rating <- c("A", "AAA", "A", "BBB","AA","A","BB", "BBB", "AA", "AA", "AA", "A", "A",

glm predict issue

2011 Dec 26

glm predict issue

Hello, I have tried reading the documentation and googling for the answer but reviewing the online matches I end up more confused than before. My problem is apparently simple. I fit a glm model (2^k experiment), and then I would like to predict the response variable (Throughput) for unseen factor levels. When I try to predict I get the following error: > throughput.pred <-

Overdispersion with binomial distribution

2009 Feb 16

Overdispersion with binomial distribution

I am attempting to run a glm with a binomial model to analyze proportion data. I have been following Crawley's book closely and am wondering if there is an accepted standard for how much is too much overdispersion? (e.g. change in AIC has an accepted standard of 2). In the example, he fits several models, binomial and quasibinomial and then accepts the quasibinomial. The output for residual

how to Store loop output from a function

2010 May 26

how to Store loop output from a function

HI, Dear R community, I am writing the following function to create one data set(*tree.pred*) and one vector(*valid.out*) from loops. Later, I want to use the data set from this loop to plot curves. I have tried return, list, but I can not use the *tree.pred* data and *valid.out* vector. auc.tree<- function(msplit,mbucket) { * tree.pred<-data.frame()

Rprof causing R to crash

2012 Dec 11

Rprof causing R to crash

I'm trying to use Rprof() to identify bottlenecks and speed up a particullary slow section of code which reads in a portion of a tif file and compares each of the values to values of predictors used for model fitting. I've written up an example that anyone can run. Generally temp would be a section of a tif read into a data.frame and used later for other processing. The first portion

svm.formula versus svm.default - different results

2017 Jul 06

svm.formula versus svm.default - different results

Dear community, I'm performing svm-regression with svm at library e1071. As I wrote in another post: "svm e1071 call - different results", I get different results if I use the svm.default rather than the svm.formula, being better the ones at svm.formula I've debugged both options. While debugging the svm.formula, I've seen that when I reach the call: ret <-

Random Seed Location

2018 Feb 27

Random Seed Location

In case you don't get an answer from someone more knowledgeable: 1. I don't know. 2. But it is possible that other packages that are loaded after set.seed() fool with the RNG. 3. So I would call set.seed just before you invoke each random number generation to be safe. Cheers, Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking

Random Seed Location

2018 Feb 26

Random Seed Location

Hi all, For some odd reason when running na?ve bayes, k-NN, etc., I get slightly different results (e.g., error rates, classification probabilities) from run to run even though I am using the same random seed. Nothing else (input-wise) is changing, but my results are somewhat different from run to run. The only randomness should be in the partitioning, and I have set the seed before this

argument "x" is missing, with no default - Please help find argument x

2012 Jun 15

argument "x" is missing, with no default - Please help find argument x

R programming question, not machine learning, although that's the content. Apologies to all for whom the following code is eye-burning. I am using foreach() to run a simulation on a randomForest model (actually conditional randomForest ... "party" package). The simulation is in two dimensions. examining how "mtry" and "ntrees" are related in terms of predictive

list to matrix?

2012 Dec 04

list to matrix?

How do I convert a list to a matrix? --8<---------------cut here---------------start------------->8--- list(c(50000, 101), c(1e+05, 46), c(150000, 31), c(2e+05, 17), c(250000, 19), c(3e+05, 11), c(350000, 12), c(4e+05, 25), c(450000, 19), c(5e+05, 16)) as.matrix(a) [,1] [1,] Numeric,2 [2,] Numeric,2 [3,] Numeric,2 [4,] Numeric,2 [5,] Numeric,2 [6,] Numeric,2 [7,]

Band-wise Conditional Sum - Actual problem

2010 Aug 30

Band-wise Conditional Sum - Actual problem

Dear R helpers, Thanks a lot for your earlier guidance esp. Mr Davind Winsemius Sir. However, there seems to be mis-communication from my end corresponding to my requirement. As I had mentioned in my earlier mail, I am dealing with a very large database of borrowers and I had given a part of it in my earlier mail as given below. For a given rating say "A", I needed to have the bad-wise

followup: Re: Issue with predict() for glm models

2004 Sep 23

followup: Re: Issue with predict() for glm models

Could you just use lines(newX, myPred, col=2) -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch]On Behalf Of Paul Johnson Sent: Thursday, September 23, 2004 10:3 AM To: r help Subject: followup: Re: [R] Issue with predict() for glm models I have a follow up question that fits with this thread. Can you force an overlaid plot

problem in testing data with e1071 package (SVM Multiclass)

2017 Sep 02

problem in testing data with e1071 package (SVM Multiclass)

Hello all, this is the first time I'm using R and e1071 package and SVM multiclass (and I'm not a statistician)! I'm very confused, then. The goal is: I have a sentence with sunny; it will be classified as "yes" sentence; I have a sentence with cloud, it will be classified as "maybe"; I have a sentence with rainy il will be classified as "no". The

outputs of KNN prediction

2004 Feb 23

outputs of KNN prediction

Hello there: I got 13 variables in my training/target set, the first 12 variables are mixture of numerical and categorical variables. The last one is the one I need to predict, and it is a numerical variable. >train<-read.table("train.txt") >test<-read.table("test.txt") >cl<-factor(train[,13]) >pred<-knn(train, test, clk=3, prob=TRUE) >pred I got

how to tabulate the prediction value using table function for naive baiyes in R

2010 Jun 30

how to tabulate the prediction value using table function for naive baiyes in R

Hi, I have written a code in R for classifying microarray data using naive bayes, the code is given below: library(e1071) train<-read.table("Z:/Documents/train.txt",header=T); test<-read.table("Z:/Documents/test.txt",header=T); cl <- c(c(rep("ALL",10), rep("AML",10))); cl <- factor(cl) model <- naiveBayes(train,cl);

Can this code be written more efficiently?

2010 Sep 30

Can this code be written more efficiently?

Dear users, I'm working on binary classification problem using Support Vector Machines (SVM). My objective is to train a series of SVM models on a grid of hyperparameters and then select those that maximize the AUC based on an independent validation sample. My attempted code is shown below. It runs well on "small" data sets but when I use it on a slightly larger sample (e.g., my

Random Seed Location

2018 Mar 04

Random Seed Location

Thank you, everybody, who replied! I appreciate your valuable advise! I will move the location of the set.seed() command to after all packages have been installed and loaded. Best regards, Gary Sent from my iPad > On Mar 4, 2018, at 12:18 PM, Paul Gilbert <pgilbert902 at gmail.com> wrote: > > On Mon, Feb 26, 2018 at 3:25 PM, Gary Black <gwblack001 at sbcglobal.net> >

similar to: partitioning data