similar to: removing factor level represented by less than x rows

Displaying 20 results from an estimated 10000 matches similar to: "removing factor level represented by less than x rows"

2010 Jun 09
4
question about "mean"
Hi there: I have a question about generating mean value of a data.frame. Take iris data for example, if I have a data.frame looking like the following: --------------------- Sepal.Length Sepal.Width Petal.Length Petal.Width Species 1 5.1 3.5 1.4 0.2 setosa 2 4.9 3.0 1.4 0.2
2017 Oct 28
2
Cannot Compute Box's M (Three Days Trying...)
Hey Duncan, Hard to debug? That's an understatement. Eyes bleeding.... In any case, I tried all your suggestions. To get "integer" for the final column, I had to change the code to get integers instead of strings. double[] d1 = ((REXPVector) ((RList) tableRead).get(0)).asDoubles(); double[] d2 = ((REXPVector) ((RList) tableRead).get(1)).asDoubles(); double[] d3 = ((REXPVector)
2017 Oct 28
2
Cannot Compute Box's M (Three Days Trying...)
Thanks Duncan. Awesome ideas! I think we're getting closer! I tried what you suggested and got a possibly better error... . . . rConnection.assign("boxMVariable", myDf); String resultBV = "str(boxMVariable)"; // your suggestion. RESULTING ERROR: Error in format.default(nam.ob, width = max(ncn), justify = "left") : invalid 'width' argument (No idea
2010 Sep 21
5
removed data is still there!
I'm confused, hope someone can point out what is not obvious to me. I thought I was creating a new data frame by 'deleting' rows from an existing dataframe - I've tried 2 methods. But this new data frame seems to remember values from its parent - even though there are no occurences. Where does it get the values versicolor and virginica from and give then a count of 0? What
2018 Mar 23
2
aggregate() naming -- bug or feature
In the examples below, the first loses the name attached by foo(), the second retains names attached by bar(). Is this an intentional difference? I?d prefer that the names be retained in both cases. foo <- function(x) { c(mean = base::mean(x)) } bar <- function(x) { c(mean = base::mean(x), sd = stats::sd(x))} aggregate(iris$Sepal.Length, by = list(iris$Species), FUN = foo) #>
2007 Apr 29
1
randomForest gives different results for formula call v. x, y methods. Why?
Just out of curiosity, I took the default "iris" example in the RF helpfile... but seeing the admonition against using the formula interface for large data sets, I wanted to play around a bit to see how the various options affected the output. Found something interesting I couldn't find documentation for... Just like the example... > set.seed(12) # to be sure I have
2017 Oct 29
2
Cannot Compute Box's M (Three Days Trying...)
Thanks Duncan. I can't tell you how helpful all your terrific replies have been. I think the biggest surprise is that nobody appears to be using Java and R together like I"m trying to do. I suppose it should be a surprise since there are no books on the subject and almost no technical documentation other than a few sites here and there. ----- I originally had the "int" as the
2005 Sep 26
3
How to get the rowindices without using which?
Hi, I was wondering if it is possible to get the rowindices without using the function "which" because I don't have a restriction criteria. Here's an example of what I mean: # take 10 randomly selected instances iris[sample(1:nrow(iris), 10),] # output Sepal.Length Sepal.Width Petal.Length Petal.Width Species 76 6.6 3.0 4.4 1.4
2018 Mar 23
1
aggregate() naming -- bug or feature
On Fri, Mar 23, 2018 at 6:43 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote: > Hello, > > Not exactly an answer but here it goes. > If you use the formula interface the names will be retained. Also if you pass named arguments: aggregate(iris["Sepal.Length"], by = iris["Species"], FUN = foo) # Species Sepal.Length # 1 setosa 5.006 # 2
2009 Feb 26
1
Random Forest confusion matrix
Dear R users, I have a question on the confusion matrix generated by function randomForest. I used the entire data set to generate the forest, for example: > print(iris.rf) Call: randomForest(formula = Species ~ ., data = iris, importance = TRUE, keep.forest = TRUE) confusion setosa versicolor virginica class.error setosa 50 0 0 0.00
2011 Feb 18
1
segfault during example(svm)
If do: > library("e1071") > example(svm) I get: svm> data(iris) svm> attach(iris) svm> ## classification mode svm> # default with factor response: svm> model <- svm(Species ~ ., data = iris) svm> # alternatively the traditional interface: svm> x <- subset(iris, select = -Species) svm> y <- Species svm> model <- svm(x, y) svm>
2013 Jan 16
4
Get a percent variable based on group
Dear all, I'd like to get a percentage variable based on a group, but without creating a new data frame. For example: data(iris) iris$percent <-unlist(tapply(iris$Sepal.Length,iris$Species,function(x) x/sum(x, na.rm=TRUE))) This does not work, I should have only three standard values, respectively for setosa, versicolor, and virginica. How can I do this? MANY THANKS, Karine
2005 Aug 26
2
problem with certain data sets when using randomForest
Hi, Since I've had no replies on my previous post about my problem I am posting it again in the hope someone notice it. The problem is that the randomForest function doesn't take datasets which has instances only containing a subset of all the classes. So the dataset with instances that either belong to class "a" or "b" from the levels "a", "b" and
2012 Jun 11
1
saving sublist lda object with save.image()
Greetings R experts, I'm having some difficulty recovering lda objects that I've saved within sublists using the save.image() function. I am running a script that exports a variety of different information as a list, included within that list is an lda object. I then take that list and create a list of that with all the different replications I've run. Unfortunately I've been
2011 Aug 16
3
Newbie question - struggling with boxplots
Hopefully I will not be flamed for this on the list, but I am starting out with R and having some trouble with combining plots. I am playing with the famous iris dataset (checking out example dataset in R while reading through Introduction to datamining) What I would like to do is create three graphs (combined boxplots) besides each other for each of the three species (Setosa, Versicolour and
2004 Aug 21
2
more on apply on data frame
Hi R People: Several of you pointed out that using "tapply" on a data frame will work on the iris data frame. I'm still having a problem. The iris data frame has 150 rows, 5 variables. The first 4 are numeric, while the last is a factor, which has the Species names. I can use tapply for 1 variable at a time: >tapply(iris[,1],iris[,5],mean) setosa versicolor virginica
2003 Jun 13
1
problem with latex of object summary reverse
Hi, I have the following problem (library Hmisc loaded, iris data loaded, R Version 1.7.0 (2003-04-16), packages updated, running on a linux Debian i386): > summary(Species~Sepal.Length,method="reverse")->a > a Descriptive Statistics by Species +------------+-----------------+-----------------+-----------------+ | |setosa |versicolor |virginica
2017 Oct 28
2
Cannot Compute Box's M (Three Days Trying...)
I'm not sure what you mean. Could you please be more specific? If I print the string, I get: boxM(boxMVariable[, -5], boxMVariable[, 5]) From this code: . . . // assign the data to a variable.rConnection.assign("boxMVariable", myDf); // create a string command with that variable name.String boxVariable = "boxM(boxMVariable[, -5], boxMVariable[, 5])";
2005 Feb 08
1
Toying with neural networks
Hello all, Ive been playing with nnet (package 'nnet') and Ive come across this problem. nnet doesnt seems to like to have more than 1000 weights. If I do: > data(iris) > names(iris)[5] <- "species" > net <- nnet(species ~ ., data=iris, size=124, maxit=10) # weights: 995 initial value 309.342009 iter 10 value 21.668435 final value 21.668435 stopped after 10
2017 Oct 29
3
Renjin?
Hi All, OK, in the "back to the drawing board" department, I found what looks like a much better solution to using R in Java. Renjin. Looking at the docs and then trying a quick example, didn't quite work. Of course I'm missing something. Although I'm telling the engine to require ("biotools") just like I would in R itself, when I get to the line of code that