similar to: removing duplicate rows

Displaying 20 results from an estimated 5000 matches similar to: "removing duplicate rows"

2010 May 06
2
splitting character strings and converting to numeric vectors
This seemingly should be quite simple but I can't solve it: I have a long character vector of geographic data (data frame column named "XY") whose elements vary in length (from 11 to 14 chars). Each element is structured as a set of digits, then an underscore, then more digits, e.g: > data.frame(head(as.character(XY))) head.as.character.XY.. 1 -448623_854854 2
2009 Aug 10
4
problem selecting rows meeting a criterion
When I try to select only those rows from the following data frame, called "data", in which X > Y X Y V3 2 2 1 8.062258 3 3 1 2.236068 4 4 1 6.324555 5 5 1 5.000000 6 1 2 8.062258 8 3 2 9.486833 9 4 2 2.236068 10 5 2 5.656854 11 1 3 2.236068 12 2 3 9.486833 14 4 3 8.062258 15 5 3 5.099020 16 1 4 6.324555 17 2 4 2.236068 18 3 4 8.062258 20 5 4 5.385165 21 1 5 5.000000
2010 Jul 07
3
quantiles on rows of a matrix
I'm trying to obtain the mean of the middle 95% of the values from each row of a matrix (that is, the highest and lowest 2.5% of values in each row are removed before calculating the mean). I am having all sorts of problems with this; for example the command: apply(matrix1,1,function(x) quantile(c(.05,.90),na.rm=T)) returns the exact same quantile values for each row, which is clearly
2009 Nov 24
2
linear regression on groups of consecutive rows of a matrix
I want to perform linear regression on groups of consecutive rows--say 5 to 10 such--of two matrices. There are many such potential groups because the matrices have thousands of rows. The matrices are both of the form: > shp[1:5,16:20] SL495B SL004C SL005C SL005A SL017A -2649 1.06 0.56 NA NA NA -2648 0.97 0.57 NA NA NA -2647 0.46 0.30 NA NA
2009 Aug 04
2
R's database capabilities
I admit that I've not done a thorough search on this topic, but from the several instructional manuals and/or tutorials I've looked at, I don't see any mention of relational database capabilities in R? Have I missed something, and if so, can someone point me in the right direction to get started? Thanks! Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis
2009 Jul 23
5
error message: .Random.seed is not an integer vector but of type 'list'
I'm trying to run this simple random sample procedure and keep getting the error message shown. I don't understand this; I've designated x as a numeric vector, so what is going on here? Thanks. > x = as.vector(c(1:12));x [1] 1 2 3 4 5 6 7 8 9 10 11 12 > mode(x) [1] "numeric" > sample(x, 3) Error in sample(x, 3) : .Random.seed is not an integer vector
2009 Apr 30
1
unloading loaded packages
I can't seem to find info on how to unload packages that have been loaded. My goal in doing so is to gain access to functions that have been masked out by those packages. Or is there another way to do so? Thanks in advance. Jim Bouldin, PhD Research Ecologist Department of Plant Sciences, UC Davis Davis CA, 95616 530-554-1740
2009 Dec 28
1
nls error message
When I try to run the following non-linear regression with variables index1 and prl3: > beta = 4 > nls(index1~beta*(1/prl3),start = list(beta = 4)) I get this error message: Error in nls(index1 ~ beta * (1/prl3), start = list(beta = 4)) : REAL() can only be applied to a 'numeric', not a 'logical' I've got no clue as to the REAL() to which this is referring. Any
2010 Jan 03
1
calculations on columns with partially matching names
Is there a command for partial matching of character strings? Specifically, I'd like to be able to calculate the mean of the values in any columns in a data frame or matrix that have identity in part of their column names. For example, columns labeled "mpw06a" and "mpw06b" match on the first five characters; their mean would be taken whereas any columns beginning with
2009 Dec 04
1
no html help upon upgrading to 2.10
I just upgraded from 2.8.1 to 2.10 on Windows Vista. BIG MISTAKE apparently because now when I type: > help(functionname) or ?functionname I get only a small text window giving some very basic info on the topic, e.g.: base-package package:base R Documentation The R Base Package Description: Base R functions Details: This package contains the
2010 Mar 11
3
NAs and row/column calculations
I continue to have great frustrations with NA values--in particular making summary calculations on rows or cols of a matrix containing them. For example, why does: > a = matrix(1:30,nrow=5) > is.na(a[c(1:2),c(3:4)]);a [,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 6 NA NA 21 26 [2,] 2 7 NA NA 22 27 [3,] 3 8 13 18 23 28 [4,] 4 9 14 19 24 29
2009 Nov 21
7
consecutive numbering of elements in a matrix
Within a very large matrix composed of a mix of values and NAs, e.g, matrix A: [,1] [,2] [,3] [1,] 1 NA NA [2,] 3 NA NA [3,] 3 10 17 [4,] 4 12 18 [5,] 6 16 19 [6,] 6 22 20 [7,] 5 11 NA I need to be able to consecutively number, in new columns, the non-NA values within each column (i.e. A[1,1] A[3,2] and A[3,3] would all be set to one, and
2009 Jul 23
5
Random # generator accuracy
Dan Nordlund wrote: "It would be necessary to see the code for your 'brief test' before anyone could meaningfully comment on your results. But your results for a single test could have been a valid "random" result." I've re-created what I did below. The problem appears to be with the weighting process: the unweighted sample came out much closer to the actual
2009 Jul 23
1
error message: .Random.seed is not an integer vector but
Thanks much Ted. I actually had just tried what you suggest here before you posted, and resolved the problem. Thanks also for the other tips. I wrote x = as.vector(c(1:12)) because I thought that the mode of x might be the problem, the error message pointing to .Random.seed notwithstanding. On a related note, I did a brief test a couple weeks back where I ran a million random samples of 3 from
2011 Sep 23
2
converting object elements to variable names and making subsequent assignments thereto
This has got to be incredibly simple but I nevertheless can't figure it out as I am apparently brain dead. I just want to convert the elements of a character vector to variable names, so as to then assign formulas to them, e.g: z = c("model1","model2"); I want to assign formulas, such as lm(y~x[,1]) and lm(y~x[,2]), to the variables "model1" and
2010 Dec 26
2
object names from character strings
I realize this is probably pretty basic but I can't figure it out. I'm looping through an array, doing various calculations and producing a resulting data frame in each loop iteration. I need to give each data frame a different name. Although I can easily create a new character string for writing each frame to an output file, I cannot figure out how to convert such strings to
2011 Aug 04
3
functions on rows or columns of two (or more) arrays
I realize this should be simple, but even after reading over the several help pages several times, I still cannot decide between the myriad "apply" functions to address it. I simply want to apply a function to all the rows (or columns) of the same index from two (or more) identically sized arrays (or data frames). For example: > a=matrix(1:50,nrow=10) >
2011 Oct 24
3
extract the p value
OK, what is the trick to extracting the overall p value from an lm object? It shows up in the summary(lm(model)) output but I can't seem to extract it: > test2 = apply(aa, 1, function(x) summary(lm(x[,1] ~ 0 + x[,3] + x[,6]))) > test2[[1]] Call: lm(formula = x[, 1] ~ 0 + x[, 3] + x[, 6]) [omitted summary output] F-statistic: 40.94 on 2 and 7 DF, p-value: 0.0001371 It does not seem
2009 Sep 24
1
subsetting from a vector or matrix
I realize this should be simple but I'm having trouble subsetting vectors and matrices, for example extracting all values meeting a certain criterion, from a vector. Cannot seem to figure out the correct syntax and help page not very helpful. Or should I be using some other function than subset. Thanks for any help. Jim Bouldin
2010 Jul 09
2
nls error regarding numerics vs logicals
I am trying to perform an nls for a valid negative exponential function: zz=nls(y~constant+a.est*2.7183^(b.est*x),start=list(constant=4.0,a.est=-4,b.est = -.005),trace=T) and am getting a number of different error messages, the most problematic of which is "Error in nls(ring.area ~ constant + a.est * 2.7183^(b.est * ba.beg), start = list(constant = 4, : REAL() can only be applied to a