similar to: Factoring a variable

Displaying 20 results from an estimated 10000 matches similar to: "Factoring a variable"

2012 Nov 29
7
Fast Normalize by Group
Hi, I have a very large data set (aprox. 100,000 rows.) The data comes from around 10,000 "groups" with about 10 entered per group. The values are in one column, the group ID is an integer in the second column. I want to normalize the values by group: for(g in unique(groups){ x[group==g] / sum(x[group==g]) } This works find in a loop, but is slow. Is there a faster way to do
2011 Jan 07
2
Stepwise SVM Variable selection
I have a data set with about 30,000 training cases and 103 variable. I've trained an SVM (using the e1071 package) for a binary classifier {0,1}. The accuracy isn't great. I used a grid search over the C and G parameters with an RBF kernel to find the best settings. I remember that for least squares, R has a nice stepwise function that will try combining subsets of variables to find
2009 Aug 12
5
Nominal variables in SVM?
Hi, The answers to my previous question about nominal variables has lead me to a more important question. What is the "best practice" way to feed nominal variable to an SVM. For example: color = ("red, "blue", "green") I could translate that into an index so I wind up with color= (1,2,3) But my concern is that the SVM will now think that the values are
2009 Aug 04
1
Save model and predictions from svm
Hello, I'm using the e1071 package for training an SVM. It seems to be working well. This question has two parts: 1) Once I've trained an SVM model, I want to USE it within R at a later date to predict various new data. I see the write.svm command, but don't know how to LOAD the model back in so that I can use it tomorrow. How can I do this? 2) I would like to add the
2012 Jun 11
3
Decision Trees or Markov Models for Cost Effectiveness
Hello, I was just assigned to perform a cost effectiveness study in healthcare. We are studying the cost effectiveness of a proposed diagnostic vs. current screening procedures. One of the team members suggest a commercial software package called "TreeAge Pro". Looking at the description, it appears to be a nice GUI to some very simple models that could be easily constructed in R.
2009 Sep 07
2
Confused - better empirical results with error in data
Hi, I have a strange one for the group. We have a system that predicts probabilities using a fairly standard svm (e1017). We are looking at probabilities of a binary outcome. The input data is generated by a perl script that calculates a bunch of things, fetches data from a database, etc. We train the system on 30,000 examples and then test the system on an unseen set of 5,000 records.
2009 Aug 30
1
SVM coefficients
Hello, I'm using the svm function from the e1071 package. It works well and gives me nice results. I'm very curious to see the actual coefficients calculated for each input variable. (Other packages, like RapidMiner, show you this automatically.) I've tried looking at attributes for the model and do see a "coefficients" item, but printing it returns an NULL result.
2009 Aug 19
1
Erros with RVM and LSSVM from kernlab library
Hello, In my ongoing quest to develop a "best" model, I'm testing various forms of SVM to see which is best for my application. I have been using the SVM from the e1071 library without problem for several weeks. Now, I'm interested in RVM and LSSVM to see if I get better performance. When running RVM or LSSVM on the exact same data as the SVM{e1071}, I get an error that I
2012 Oct 09
4
Convert COLON separated format
I have a bunch of data sets that were created for the libsvm tool. They are in "colon separated sparse format". i.e. 1 5:1 27:3 345:10 Is a row with the label of "1" and only has values in columns 5, 27, and 345. I want to read these into a data.frame in R. Is there a simple way to do this? -- Noah Silverman, M.S. UCLA Department of Statistics 8117 Math Sciences
2009 Jul 18
1
svm works but tune.svm give error
Hello, I'm using the e1071 library for SVM functions. I can quickly train an SVM with: svm(formula = label ~ ., data = testdata) That works well. I want to tune the parameters, so I tried: tune.svm(label ~ ., data=testdata[1:2000, ], gamma=10^(-6:3), cost=10^(1:2)) THIS FAILS WITH AN ERROR: 'names' attribute [199] must be the same length as the vector [184] I don't
2011 Aug 23
2
dummy variables from factors
Hi, Looking at a large data set with many factors. I would like to expand each factor variable into multiple new variables for each level. (0,1) coding. My first though was just to code a big nasty loop, to take each level and cbind a column onto my data set. But, that seems painful. There must be a better way. Is there an "easy" way to do this in R? (Note, I don't want to
2012 May 18
2
Failure building any package
Hello, I'm attempting to build a package using R 2.15.0 on OS X I am getting a generic failure when performing a cran type check on the package. Even with a very simple test package, it still fails int he same place. Example: In R: rm(list=ls()) foo <- function(x){print(x)} package.skeleton(name="foo") Then, at the command line: R CMD build foo R CMD check --as-cran
2012 Feb 13
4
Reading in csv with footer
Hi, I have a CSV file that is formatted well, except that the last line is a "summary" not is CSV format. Toy example: label_1, label_2, label_3 1,2,3 3,2,4 2,3,4 Total Rows: 3 When I try to import this into R with: d <- read.table("foo.csv", header=T, sep=",") It fails to import properly because of the last line. Currently, I have a shell script that strips
2009 Oct 23
1
Data format for KSVM
Hi, I have a process using svm from the e1071 library. it works. I want to try using the KSVM library instead. The same data used wiht e1071 gives me an error with KSVM. My data is a data.frame. sample code: svm_formula <- formula(y ~ a + B + C) svm_model <- ksvm(formula, data=train_data, type="C-svc", kernel="rbfdot", C=1) I get the following error:
2012 May 18
4
Menus - best practices?
Hello, I need to design a fairly simple front-end for someone to use an R script system that I've built. My thought was to just use the text based menus available in the base R package, perhaps in some kind of loop. How have other people done this? Any "best practices" that you can recommend? Thanks! -- Noah Silverman UCLA Department of Statistics 8117 Math Sciences Building
2012 May 29
2
Converting to XTS loses data.frame structure
Hello, I noticed something odd when working with data frames and xts objects. If I read in a CSV file, R creates a nice data.frame. This works well. If I then convert to an XTS object, I see that all the values in the data are now quoted. My data is a mix of numeric and character. This is usually seen when converting a data.frame to a matrix, as R will treat all the data as the same class.
2011 Sep 02
2
Avoiding for Loop for moving average
Hello, I need to calculate a moving average and an exponentially weighted moving average over a fairly large data set (500K rows). Doing this in a for loop works nicely, but is slow. ewma <- data$col[1] N <- dim(data)[1] for(i in 2:N){ data$ewma <- alpha * data$ewma[i-1] + (1-alpha) * data$value[i] } Since the moving average "accumulates" as we move through the data,
2012 Oct 14
4
Date Math
Hello, I have a time series object (xts) that I iterate over in a loop. Works fine. My challenge is that I want to be able to reference other entries in the series by math. i.e. For today's observation, what were the last 5 observations? If indexed numerically, it is trivial, but I can figure out how to do this with dates. This is slightly more difficult as there may not be an
2012 Feb 28
6
Cleaning up messy Excel data
Unfortunately, some data I need to work with was delivered in a rather messy Excel file. I want to import into R and clean up some things so that I can do my analysis. Pulling in a CSV from Excel is the easy part. My current challenge is dealing with some text mixed in the values. i.e. 118 5.7 <2.0 3.7 Since this column in Excel has a "<2.0" value, then R reads the
2009 Aug 12
1
nominal to numeric function
Hi, I'm training an SVM (C-classification from e1071 library) Some of the variables in my data set are nominal. Is there some easy/automatic way to convert them to numerical representations? Thanks, -N