Displaying 20 results from an estimated 40000 matches similar to: "which rows are duplicates?"
2013 Apr 12
1
Removing rows that are duplicates but column values are in reversed order
Hi,
From your example data,
dat1<- read.table(text="
id1?? id2?? value
a????? b?????? 10
c????? d??????? 11
b???? a???????? 10
c????? e???????? 12
",sep="",header=TRUE,stringsAsFactors=FALSE)
#it is easier to get the output you wanted
dat1[!duplicated(dat1$value),]
#? id1 id2 value
#1?? a?? b??? 10
#2?? c?? d??? 11
#4?? c?? e??? 12
But, if you have cases like the one
2006 Oct 30
2
which duplicated rows to delete
Hi
Say I've this vector with several duplicates
>x<-c(1,2,3,4,2,6,2,8,2,3)
>which(duplicated(x))
[1] 5 7 9 10 11
But what I realy want is somthing like:
List({2,5,7}, {3,10}, ...)
Then from each sublist I can specify which of the duplicate items to drop
res<-NULL
for(vec in myDuplicateList)
res<-rbind(res, subset(data[vec,], myCrit))
I'll get some of the way by
2006 Jan 05
1
Memory limitation in GeoR - Windows or R?
Dear Aaron,
I am really a tool user and not a tool maker (actually an ecologist
doing some biostatistics)... so, I take the liberty of sending a copy of
this e-mail to the r-help list where capable computer persons and true
statisticians may provide more relevant information and also to Paulo
Ribeiro and Peter Diggle, the authors of geoR..
I really feel that your huge matrix cannot be
2007 Feb 26
1
Adding duplicates by rows
Hi,
I am trying to add duplicates of matrix "mat" by row. Commands
subset(mat,duplicated(rownames(mat)))
or
mat[which(duplicated(rownames(mat))),]
return only half of the required indices. How can I find the remaining
ones, ie the matches, so that I can add them up?
Thanks,
Serguei
___________________________________________________________________
Austrian Institute of Economic
2011 Dec 08
1
partial duplicates of dataframe rows, indexing and removal
Hello. I am trying to remove from my dataframe, those rows in which the first
7 columns are duplicated even if subsequent columns make those rows unique.
df<-data.frame(id=rep(c('amy','bob','joe') , each=5),
pet1=sample(LETTERS[1:3],15, replace=T),
pet2=sample(LETTERS[1:3],15, replace=T),
pet3=sample(LETTERS[1:5],15, replace=T))
>df
id pet1 pet2
2012 Jul 23
1
duplicated() variation that goes both ways to capture all duplicates
Dear all
The trouble with the current duplicated() function in is that it can
report duplicates while searching fromFirst _or_ fromLast, but not
both ways. Often users will want to identify and extract all the
copies of the item that has duplicates, not only the duplicates
themselves.
To take the example from the man page:
> data(iris)
> iris[duplicated(iris), ] ##duplicates while
2011 Apr 08
5
duplicates() function
I need a function which is similar to duplicated(), but instead of
returning TRUE/FALSE, returns indices of which element was duplicated.
That is,
> x <- c(9,7,9,3,7)
> duplicated(x)
[1] FALSE FALSE TRUE FALSE TRUE
> duplicates(x)
[1] NA NA 1 NA 2
(so that I know that element 3 is a duplicate of element 1, and element
5 is a duplicate of element 2, whereas the others were
2009 Dec 17
1
Remove duplicates from a data frame but with some special requirements
Hi all.
So I have a data frame with multiple columns/variables. The first variable
is a major sample name for which there are some sub-samples. Currently I
have used the following command to remove the duplicates:
Samps_working<-Samps[-c(which(duplicated(Samps$ESR_Ref_edit))),]
This removes all of the duplicated sample rows.
However, I just realised that, of course, this removes the first
2005 Mar 28
2
Generating list of vector coordinates
Hi.
Can anyone suggest a simple way to obtain in R a list of vector
coordinates of the following form? The code below is Mathematica.
In[5]:=
Flatten[Table[{i,j,k},{i,3},{j,4},{k,5}], 2]
Out[5]=
{{1,1,1},{1,1,2},{1,1,3},{1,1,4},{1,1,5},{1,2,1},{1,2,2},{1,2,3},{1
,2,4},{1,2,
5},{1,3,1},{1,3,2},{1,3,3},{1,3,4},{1,3,5},{1,4,1},{1,4,2},{1,4,3},
{1,4,
2010 Apr 23
4
Remove duplicated rows
Hi all,
I have a dataset similar to the following
Name Date Value
A 1/01/2000 4
A 2/01/2000 4
A 3/01/2000 5
A 4/01/2000 4
A 5/01/2000 1
B 6/01/2000 2
B 7/01/2000 1
B 8/01/2000 1
I would like R to remove duplicates based on column 1 and 3 only. In
addition, I would like R to remove duplicates based on the underlying and
overlying row only. For example, for A, I would like to remove row 2 only
2005 Mar 18
3
extract rows in dataframe with duplicated column values
Hi
I want to extract all the rows in a data frame that have duplicates
for a given column.
I would expect this question to come up pretty often but I have
researched the archives and surprisingly couldn't find anything.
The best I can come up with is:
x <- data.frame(a=c(1,2,2,3,3,3), b=10)
xdup1 <- duplicated(x[,1])
xdup2 <- duplicated(x[,1][nrow(x):1])[nrow(x):1]
xAllDups <-
2009 Jul 20
2
I need to obtain all the rows in m1 in which m2 the elements of m2 are present
Hi could you yelp me please with this
Suppose that we have the following matrix
m1<-matrix(c("a","7","a","i","o","u","i","1","2","3","4","5","6","7"),
ncol=2)
m1
[,1] [,2]
[1,] "a" "1"
[2,] "7" "2"
[3,]
2010 Mar 14
3
Removing Duplicates
Hi all,
I am starting fresh with a local repository of mails, which almost certainly have duplicates in them. I am going to use maildirs, and ensure all mails are input with CRLFs.
The question is: does anybody know how I can find and remove duplicates, either while injecting mail with IMAP, or afterward? I can use tools to find duplicate Message-IDs, but don't know of a way to remove
2010 Jul 13
2
Checking for duplicate rows in data frame efficiently
I wrote something to check for duplicate rows in a data frame, but it is too inefficient. Is there a way to do this without the nested loops?
This code correctly indicates rows 1-7, 1-8, 2-9 and 7-8 are duplicates.
> m <- matrix(c(1,1,1,1,1, 2,2,2,2,2, 6,6,6,6,6, 3,3,3,3,3, 4,4,4,4,4, 5,5,5,5,5, 1,1,1,1,1, 1,1,1,1,1, 2,2,2,2,2, 7,7,7,7,7), ncol=5, byrow=TRUE)
> df <- data.frame(m)
2010 Jun 07
3
Subsetting subsets of data.frames
Hey Everyone,
I have been stumped by this all day.
Basically, I have a data.frame of multiple columns. Of concern are "id" &
"date"
For some reason, oftentimes there are duplicates of data with the same date.
I would like to remove the duplicates per different id (removing duplicate
dates for the entire data.frame would leave nothing since different id's all
have
2010 Jul 29
3
Fwd: duplicates
-- Eredeti üzenet --
Feladó: Dévaványai Agamemnón <devavanyai@citromail.hu>Címzett: r-hel@r-project.org, r-hel@r-project.orgElküldve: 2010. július 29. 16:29Tárgy : duplicates
Sorry!
I try it again
Dear R Users!
I have a dataframe with duplicatecases. Var1 duplicated by var2.
var1 var2 var3 var4 var5
1 4 500 1 2
1 3 200 2 5
1
2017 Sep 19
3
Jump Threading duplicates dbg.declare intrinsics for fragments, bug?
Hi,
I'm hitting an assertion "overlapping or duplicate fragments" in the
DWARF codegen in addFragmentOffset(). This originates from a
duplicated dbg.declare intrinsic, declaring the same fragment twice.
The duplicated call was generated by the jump threading pass.
I have a patch (see below) that removes simply such duplicates, but
I'm not sure whether that is the right
2009 Jul 24
3
Duplicated date values aren't duplicates
Dear list,
I just had a function (as.ltraj in Adehabitat) give me the following error:
"Error in as.ltraj(xy, id, date = da) : non unique dates for a given burst"
I checked my dates and got the following:
> dupes<-mydata$DateTime[duplicated(mydata$DateTime)]
> dupes
[1] (07/30/02 00:00:00) (08/06/03 17:45:00)
Is there a reason different dates would come up as duplicate
2009 May 14
4
Duplicates and duplicated
Hi everybody.
I want to identify not only duplicate number but also the original number
that has been duplicated.
Example:
x=c(1,2,3,4,4,5,6,7,8,9)
y=duplicated(x)
rbind(x,y)
gives:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x 1 2 3 4 4 5 6 7 8 9
y 0 0 0 0 1 0 0 0 0 0
i.e. the second 4 [,5] is a duplicate.
What I want is
2011 Feb 28
3
Problems using unique function and !duplicated
Hi, I am trying to simultaneously remove duplicate variables from two or more
variables in a small R data.frame. I am trying to reproduce the SAS
statements from a Proc Sort with Nodupkey for those familiar with SAS.
Here's my example data :
test <- read.csv("test.csv", sep=",", as.is=TRUE)
> test
date var1 var2 num1 num2
1 28/01/11 a 1 213 71
2