thr3ads.net - similar to: "Factoring a variable"

Displaying 20 results from an estimated 10000 matches similar to: "Factoring a variable"

2012 Nov 29

Fast Normalize by Group

Hi, I have a very large data set (aprox. 100,000 rows.) The data comes from around 10,000 "groups" with about 10 entered per group. The values are in one column, the group ID is an integer in the second column. I want to normalize the values by group: for(g in unique(groups){ x[group==g] / sum(x[group==g]) } This works find in a loop, but is slow. Is there a faster way to do

Stepwise SVM Variable selection

2011 Jan 07

Stepwise SVM Variable selection

I have a data set with about 30,000 training cases and 103 variable. I've trained an SVM (using the e1071 package) for a binary classifier {0,1}. The accuracy isn't great. I used a grid search over the C and G parameters with an RBF kernel to find the best settings. I remember that for least squares, R has a nice stepwise function that will try combining subsets of variables to find

Nominal variables in SVM?

2009 Aug 12

Nominal variables in SVM?

Hi, The answers to my previous question about nominal variables has lead me to a more important question. What is the "best practice" way to feed nominal variable to an SVM. For example: color = ("red, "blue", "green") I could translate that into an index so I wind up with color= (1,2,3) But my concern is that the SVM will now think that the values are

Save model and predictions from svm

2009 Aug 04

Save model and predictions from svm

Hello, I'm using the e1071 package for training an SVM. It seems to be working well. This question has two parts: 1) Once I've trained an SVM model, I want to USE it within R at a later date to predict various new data. I see the write.svm command, but don't know how to LOAD the model back in so that I can use it tomorrow. How can I do this? 2) I would like to add the

Decision Trees or Markov Models for Cost Effectiveness

2012 Jun 11

Decision Trees or Markov Models for Cost Effectiveness

Hello, I was just assigned to perform a cost effectiveness study in healthcare. We are studying the cost effectiveness of a proposed diagnostic vs. current screening procedures. One of the team members suggest a commercial software package called "TreeAge Pro". Looking at the description, it appears to be a nice GUI to some very simple models that could be easily constructed in R.

Confused - better empirical results with error in data

2009 Sep 07

Confused - better empirical results with error in data

Hi, I have a strange one for the group. We have a system that predicts probabilities using a fairly standard svm (e1017). We are looking at probabilities of a binary outcome. The input data is generated by a perl script that calculates a bunch of things, fetches data from a database, etc. We train the system on 30,000 examples and then test the system on an unseen set of 5,000 records.

SVM coefficients

2009 Aug 30

SVM coefficients

Hello, I'm using the svm function from the e1071 package. It works well and gives me nice results. I'm very curious to see the actual coefficients calculated for each input variable. (Other packages, like RapidMiner, show you this automatically.) I've tried looking at attributes for the model and do see a "coefficients" item, but printing it returns an NULL result.

Erros with RVM and LSSVM from kernlab library

2009 Aug 19

Erros with RVM and LSSVM from kernlab library

Hello, In my ongoing quest to develop a "best" model, I'm testing various forms of SVM to see which is best for my application. I have been using the SVM from the e1071 library without problem for several weeks. Now, I'm interested in RVM and LSSVM to see if I get better performance. When running RVM or LSSVM on the exact same data as the SVM{e1071}, I get an error that I

Convert COLON separated format

2012 Oct 09

Convert COLON separated format

I have a bunch of data sets that were created for the libsvm tool. They are in "colon separated sparse format". i.e. 1 5:1 27:3 345:10 Is a row with the label of "1" and only has values in columns 5, 27, and 345. I want to read these into a data.frame in R. Is there a simple way to do this? -- Noah Silverman, M.S. UCLA Department of Statistics 8117 Math Sciences

svm works but tune.svm give error

2009 Jul 18

svm works but tune.svm give error

Hello, I'm using the e1071 library for SVM functions. I can quickly train an SVM with: svm(formula = label ~ ., data = testdata) That works well. I want to tune the parameters, so I tried: tune.svm(label ~ ., data=testdata[1:2000, ], gamma=10^(-6:3), cost=10^(1:2)) THIS FAILS WITH AN ERROR: 'names' attribute [199] must be the same length as the vector [184] I don't

dummy variables from factors

2011 Aug 23

dummy variables from factors

Hi, Looking at a large data set with many factors. I would like to expand each factor variable into multiple new variables for each level. (0,1) coding. My first though was just to code a big nasty loop, to take each level and cbind a column onto my data set. But, that seems painful. There must be a better way. Is there an "easy" way to do this in R? (Note, I don't want to

Failure building any package

2012 May 18

Failure building any package

Hello, I'm attempting to build a package using R 2.15.0 on OS X I am getting a generic failure when performing a cran type check on the package. Even with a very simple test package, it still fails int he same place. Example: In R: rm(list=ls()) foo <- function(x){print(x)} package.skeleton(name="foo") Then, at the command line: R CMD build foo R CMD check --as-cran

Reading in csv with footer

2012 Feb 13

Reading in csv with footer

Hi, I have a CSV file that is formatted well, except that the last line is a "summary" not is CSV format. Toy example: label_1, label_2, label_3 1,2,3 3,2,4 2,3,4 Total Rows: 3 When I try to import this into R with: d <- read.table("foo.csv", header=T, sep=",") It fails to import properly because of the last line. Currently, I have a shell script that strips

Data format for KSVM

2009 Oct 23

Data format for KSVM

Hi, I have a process using svm from the e1071 library. it works. I want to try using the KSVM library instead. The same data used wiht e1071 gives me an error with KSVM. My data is a data.frame. sample code: svm_formula <- formula(y ~ a + B + C) svm_model <- ksvm(formula, data=train_data, type="C-svc", kernel="rbfdot", C=1) I get the following error:

Menus - best practices?

2012 May 18

Menus - best practices?

Hello, I need to design a fairly simple front-end for someone to use an R script system that I've built. My thought was to just use the text based menus available in the base R package, perhaps in some kind of loop. How have other people done this? Any "best practices" that you can recommend? Thanks! -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building

Converting to XTS loses data.frame structure

2012 May 29

Converting to XTS loses data.frame structure

Hello, I noticed something odd when working with data frames and xts objects. If I read in a CSV file, R creates a nice data.frame. This works well. If I then convert to an XTS object, I see that all the values in the data are now quoted. My data is a mix of numeric and character. This is usually seen when converting a data.frame to a matrix, as R will treat all the data as the same class.

Avoiding for Loop for moving average

2011 Sep 02

Avoiding for Loop for moving average

Hello, I need to calculate a moving average and an exponentially weighted moving average over a fairly large data set (500K rows). Doing this in a for loop works nicely, but is slow. ewma <- data$col[1] N <- dim(data)[1] for(i in 2:N){ data$ewma <- alpha * data$ewma[i-1] + (1-alpha) * data$value[i] } Since the moving average "accumulates" as we move through the data,

Date Math

2012 Oct 14

Date Math

Hello, I have a time series object (xts) that I iterate over in a loop. Works fine. My challenge is that I want to be able to reference other entries in the series by math. i.e. For today's observation, what were the last 5 observations? If indexed numerically, it is trivial, but I can figure out how to do this with dates. This is slightly more difficult as there may not be an

Cleaning up messy Excel data

2012 Feb 28

Cleaning up messy Excel data

Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. 118 5.7 <2.0 3.7 Since this column in Excel has a "<2.0" value, then R reads the

nominal to numeric function

2009 Aug 12

nominal to numeric function

Hi, I'm training an SVM (C-classification from e1071 library) Some of the variables in my data set are nominal. Is there some easy/automatic way to convert them to numerical representations? Thanks, -N

similar to: Factoring a variable