thr3ads.net - similar to: "Confused - better empirical results with error in data"

Displaying 20 results from an estimated 9000 matches similar to: "Confused - better empirical results with error in data"

Stepwise SVM Variable selection

2011 Jan 07

Stepwise SVM Variable selection

I have a data set with about 30,000 training cases and 103 variable. I've trained an SVM (using the e1071 package) for a binary classifier {0,1}. The accuracy isn't great. I used a grid search over the C and G parameters with an RBF kernel to find the best settings. I remember that for least squares, R has a nice stepwise function that will try combining subsets of variables to find

Sapply

2009 Aug 30

Sapply

Hi, I need a bit of guidance with the sapply function. I've read the help page, but am still a bit unsure how to use it. I have a large data frame with about 100 columns and 30,000 rows. One of the columns is "group" of which there are about 2,000 distinct "groups". I want to normalize (sum to 1) one of my variables per-group. Normally, I would just write a huge

Strange column shifting with read.table

2009 Aug 02

Strange column shifting with read.table

Hi, I am reading in a dataframe from a CSV file. It has 70 columns. I do not have any kind of unique "row id". rawdata <- read.table("r_work/train_data.csv", header=T, sep=",", na.strings=0) When training an svm, I keep getting an error So, as an experiment, I wrote the data back out to a new file so that I could see what the svm function sees.

Plot multiple columns

2010 Jun 01

Plot multiple columns

I'm running a long MCMC chain that is generating samples for 22 variables. I have each run of the chain as a row in a matrix. So: Chain[,1] is the column with all the samples for variable one. Chain[,2] is the column with all the samples for variable 2, etc. I'd like to fit all 22 on a single page to print a nice summary. It is OK if the graphs are small, I just need to show the

R on Multi Core

2009 Sep 11

R on Multi Core

Hi, Our discussions about 64 bit R has led me to another thought. I have a nice dual core 3.0 chip inside my Linux Box (Running Fedora 11.) Is there a version of R that would take advantage of BOTH cores?? (Watching my system performance meter now is interesting, Running R will hold a single core at 100% perfectly, but the other core sites idle.) Thanks! -- Noah

Create Variable names dynamically

2011 Mar 31

Create Variable names dynamically

Hi, I want to create variable names from within my code, but can't find any documentation for this. An example is probably the best way to illustrate. I am reading data in from a file, doing a bunch of stuff, and want to generate variables with my output. (I could make a "list of lists" and name all the elements, but I really want separate variables.) ################# #This is

SVM coefficients

2009 Aug 30

SVM coefficients

Hello, I'm using the svm function from the e1071 package. It works well and gives me nice results. I'm very curious to see the actual coefficients calculated for each input variable. (Other packages, like RapidMiner, show you this automatically.) I've tried looking at attributes for the model and do see a "coefficients" item, but printing it returns an NULL result.

Different way of scaling data

2009 Oct 16

Different way of scaling data

Hi, I have a data.frame that I need to scale. I've been using the scale function and it works nicely. Some of the libraries I'm testing won't accept negative values for data, so I need to find a way to scale the data from 0 to 1 Any ideas? Thans!

Pull Coefficients from MCMCpack models

2009 Sep 22

Pull Coefficients from MCMCpack models

Hi, I've been testing some models with the MCMCpack library. I can run the process and get a nice model "object". I can easily see the summary and even plot it. I can't seem to figure out how to: 1) Access the final coefficients in the model 2) Turn the coefficients into a model so I can then run predictions using them. A summary command will SHOW Me the coefficients, but

Clogit or LRM?

2009 Aug 25

Clogit or LRM?

Hello I believe that I'm getting very close in my modeling application. I've come across a challenge that I am unable to solve and would really appreciate the group's opinion. I've been using the val.prob function from the Design library (Thanks Frank!!) to both evaluate and visualize my model. From the scores and graph, it appears as my model is very accurate in

Easy way to get top 2 items from vector

2009 Sep 03

Easy way to get top 2 items from vector

Hi, I use the max function often to find the top value from a matrix or column of a data.frame. Now I'm looking to find the top 2 (or three) values from my data. I know that I could sort the list and then access the first two items, but that seems like the "long way". Is there some way to access "max_2" or similar? Thanks! -- Noah

accessing return variables from a function

2010 Jul 09

accessing return variables from a function

Hi, I am trying to figure out a "short" way to access two values output from the sort function. >x <- c(3,4,3,6,78,3,1,2) >sort(x, index.return=T) $x [1] 1 2 3 3 3 4 6 78 $ix [1] 7 8 1 3 6 2 4 5 It would be great to do something like this (doesn't work.): c(y, indexes) <- sort(x, index.return=T) But that doesn't work. I CAN grab the output of sort in a

Performance measure for probabilistic predictions

2009 Aug 19

Performance measure for probabilistic predictions

Hello, I'm using an SVM for predicting a model, but I'm most interested in the probability output. This is easy enough to calculate. My challenge is how to measure the relative performance of the SVM for different settings/parameters/etc. An AUC curve comes to mind, but I'm NOT interested in predicting true vs false. I am interested in finding the most accurate probability

Nominal variables in SVM?

2009 Aug 12

Nominal variables in SVM?

Hi, The answers to my previous question about nominal variables has lead me to a more important question. What is the "best practice" way to feed nominal variable to an SVM. For example: color = ("red, "blue", "green") I could translate that into an index so I wind up with color= (1,2,3) But my concern is that the SVM will now think that the values are

Discretize factors?

2010 May 15

Discretize factors?

Hi, I'm looking for an easy way to discretize factors in R I've noticed that the lm function does this automatically with a nice result. If I have group <- c("A", "B","B","C","C","C") and run: lm(result ~ x1 + group) The lm function has split the group into separate binary variables {0,1} before performing the

[OT] book on Linux scripting

2009 Sep 03

[OT] book on Linux scripting

Dear R People: I know that this is off topic, but could anyone recommend a good book on Linux scripting please? Any help would be much appreciated! Thanks, Erin -- Erin Hodgess Associate Professor Department of Computer and Mathematical Sciences University of Houston - Downtown mailto: erinm.hodgess at gmail.com

Managing output

2009 Aug 26

Managing output

Hi, Is there a way to build up a vector, item by item. In perl, we can "push" an item onto an array. How can we can do this in R? I have a loop that generates values as it goes. I want to end up with a vector of all the loop results. In perl it woud be: for(item in list){ result <- 2*item^2 (Or whatever formula, this is just a pseudo example) Push(@result_list,

Populating then sorting a matrix and/or data.frame

2010 Nov 11

Populating then sorting a matrix and/or data.frame

Hi, I have a process in R that produces a lot of output. My plan was to build up a matrix or data.frame "row by row", so that I'll have a nice object with all the resulting data. I started with: results <- matrix(ncol=3) names(results) <- c("one", "two", "three") Then, when looping through the data: results <- rbind(results, c(a,b,c))

Build a dataframe row by row?

2009 Aug 04

Build a dataframe row by row?

Hi, Time for another of my "newbie" questions. Is it possible to build up a data.frame "row by row" as I go I'm going to be running a bunch of experiments (many in a loop) to test different things. I'm using AUC as my main performance measure. My thought was to add a row to a data.frame for each iteration and then have a nice summary report at the end. I found

Summarizing counts by multiple factors

2010 May 12

Summarizing counts by multiple factors

Hi, An example data set is: group level color A 1 "blue" A 1 "Red" B 1 "blue" B 2 "Red" A 2 "Red" B 2 "Red" B 2 "blue" B 2 "blue" A 2 "blue" A 2 "Red"

similar to: Confused - better empirical results with error in data