Displaying 20 results from an estimated 30000 matches similar to: "Select Original and Duplicates"
2012 Jul 23
1
duplicated() variation that goes both ways to capture all duplicates
Dear all
The trouble with the current duplicated() function in is that it can
report duplicates while searching fromFirst _or_ fromLast, but not
both ways. Often users will want to identify and extract all the
copies of the item that has duplicates, not only the duplicates
themselves.
To take the example from the man page:
> data(iris)
> iris[duplicated(iris), ] ##duplicates while
2012 Sep 27
3
Keep rows in a dataset if one value in a column is duplicated
Hi,
I have a data set of observations by either one person or a pair of people.
I want to only keep the pair observations, and was using the code below
until it gave me the error " $ operator is invalid for atomic vectors". I am
just beginning to learn R, so I apologize if the code is really rough.
Basically I want to keep all the rows in the data set for which the value of
2012 Sep 10
4
Identifying duplicate rows?
Hi,
I am trying to identify duplicate values in a column in a date frame. The
duplicated function identifies the duplicate rows in the data frame but it
only does this for the second record, not both records. Is there a way to
mark both rows in the data frame as TRUE?
dfA$dups<-duplicated(dfA$Value)
dfA
Site State Value dups
929 VA 73 FALSE
929 VA 73 TRUE
930 VA 76 FALSE
930 VA 76 TRUE
931
2008 Jan 10
5
Extracting last time value
I have a dataframe as follows:
Date time value
20110620 11:18:00 7
20110620 11:39:00 9
20110621 11:41:00 8
20110621 11:40:00 6
20110622 14:05:00 8
20110622 14:06:00 6
For every date, I want to extract the row that has the greatest time.
Therefore, ending up like:
20110620 11:39:00 9
20110621 11:41:00 8
20110622 14:07:00 6
I am using for loops (for every date, find largest time value) to do
2011 Jan 20
6
Identify duplicate numbers and to increase a value
Hi everybody.
I want to identify duplicate numbers and to increase a value of 0.01 for each time that it is duplicated.
Example:
x=c(1,2,3,5,6,2,8,9,2,2)
I want to do this:
1
2 + 0.01
3
5
6
2 + 0.02
8
9
2 + 0.03
2 + 0.04
I am trying to get something like this:
1
2.01
3
5
6
2.02
8
9
2.03
2.04
Actually I just know the way to identify the duplicated numbers
rbind(x, duplicated(x) |
2012 Oct 23
10
How to pick colums from a ragged array?
I have a large dataset (~1 million rows) of three variables: ID (patient's name), DATE (of appointment) and DIAGNOSIS (given on that date).
Patients may have been assigned more than one diagnosis at any one appointment - leading to two rows, same ID and DATE but different DIAGNOSIS.
The diagnoses may change between appointments.
I want to subset the data in two ways:
- define groups
2008 Nov 16
4
duplicate values
Hei R Users,
i have the following dataframe:
Datetime Temperature and many more collumns
1 2008-6-1 00:00:00 5
2 2008-6-1 02:00:00 5
3 2008-6-1 03:00:00 6
4 2008-6-1 03:00:00 0
5 2008-6-1 04:00:00 6
6 2008-6-1 04:00:00 0
7 2008-6-1 05:00:00 7
8 2008-6-1 06:00:00
2013 Jan 18
5
select rows with identical columns from a data frame
I have a data frame with several columns.
I want to select the rows with no NAs (as with complete.cases)
and all columns identical.
E.g., for
--8<---------------cut here---------------start------------->8---
> f <- data.frame(a=c(1,NA,NA,4),b=c(1,NA,3,40),c=c(1,NA,5,40))
> f
a b c
1 1 1 1
2 NA NA NA
3 NA 3 5
4 4 40 40
--8<---------------cut
2008 Jul 09
3
randomly select duplicated entries
Using this data as an example
dat <- read.table(textConnection("Id myvar
12 1
12 2
12 6
34 9
34 4
34 8
65 15
65 23"), header = TRUE)
closeAllConnections()
how can I create another data set that does not have duplicate entries
for 'Id', but the included values
are randomly selected from the available ones.
Thanks!
Juliet
2012 Mar 01
2
read.table issue with "#"
Hello,
>
> The problem is that I get a the following error bacause anything after the
> # is ignored.
>
> Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,
> :
> line 6 did not have 500 elements
>
> R thinks that line 6 has only 2 elements because of the #.
>
Use 'readLines' instead, followed by 'strsplit'.
In the
2013 Jun 10
2
please check this
Hi,
Try this:
which(duplicated(res10Percent))
# [1] 117 125 157 189 213 235 267 275 278 293 301 327 331 335 339 367 369 371 379
#[20] 413 415 417 441 459 461 477 479 505
res10PercentSub1<-subset(res10Percent[which(duplicated(res10Percent)),],dummy==1)? #most of the duplicated are dummy==1
res10PercentSub0<-subset(res10Percent[which(duplicated(res10Percent)),],dummy==0)
2012 Oct 20
2
Help with programming a tricky algorithm
Hi All,
I'm a little stumped by the following problem. I've got a dataset with
the following structure:
idxy ix iy country (other variables)
1 1 1 c1 x1
2 1 2 c1 x2
3 1 3 c1 x3
. . . . .
3739 55 67 c7 x3739
3740 55 68 c7 x3740
where ix and
2009 May 14
4
Duplicates and duplicated
Hi everybody.
I want to identify not only duplicate number but also the original number
that has been duplicated.
Example:
x=c(1,2,3,4,4,5,6,7,8,9)
y=duplicated(x)
rbind(x,y)
gives:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
x 1 2 3 4 4 5 6 7 8 9
y 0 0 0 0 1 0 0 0 0 0
i.e. the second 4 [,5] is a duplicate.
What I want is
2012 Aug 13
5
How can I get the Ids with Duplicated key and corresponding Ids with original key?
In this following example Id 4 is duplicated with Id 1.
Like this I want both Ids (Duplicated and Duplicated with). Can anyone help?
df <- data.frame(
"Publication" = c(1, 2, 3, 1, 4, 5, 2, 3),
"Reference" = c("a", "b", "c", "a", "d", "e", "b", "c"),
"Id"= c(1, 2, 3, 4,
2012 Nov 12
5
Matrix to data frame conversion
I have a matrix which I wanted to convert to a data frame. As I could not
succeed and resorted to export to csv and reimport it again. Why did I fail
in the attempt and how can I achieve what I wanted without this
roundabouts?
The original matrix:
> str(comb_model0)
num [1:90, 1:4] 3.5938 0.0274 0.0342 0.0135 0.0207 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:90]
2017 Dec 13
3
match and new columns
Hi all,
I have a data frame
tdat <- read.table(textConnection("A B C Y
A12 B03 C04 0.70
A23 B05 C06 0.05
A14 B06 C07 1.20
A25 A23 A12 3.51
A16 A25 A14 2,16"),header = TRUE)
I want match tdat$B with tdat$A and populate the column values of tdat$A
( col A and Col B) in the newly created columns (col D and col E). please
find my attempt and the desired output below
Desired output
2017 Dec 13
2
match and new columns
Thank you Rui,
I did not get the desired result. Here is the output from your script
A B C Y D E
1 A12 B03 C04 0.70 0 0
2 A23 B05 C06 0.05 0 0
3 A14 B06 C07 1.20 0 0
4 A25 A23 A12 3.51 1 1
5 A16 A25 A14 2,16 4 4
On Wed, Dec 13, 2017 at 4:36 PM, Rui Barradas <ruipbarradas at sapo.pt> wrote:
> Hello,
>
> Here is one way.
>
> tdat$D <- ifelse(tdat$B %in% tdat$A,
2020 Feb 29
3
dput()
I think Robin knows about FAQ 7.31/floating point (author of
'Brobdingnag', among other numerical packages). I agree that this is
surprising (to me).
To reframe this question: is there way to get an *exact* ASCII
representation of a numeric value (i.e., guaranteeing the restored value
is identical() to the original) ?
.deparseOpts has
?"digits17"?: Real and finite complex
2012 Nov 27
3
loop command to matrix
Dear UseRs,Extremely sorry for a basic question. I have a matrix of 19 rows and 365 columns. what i want to do is the following...First i want to leave out column number 1 and want to calculate the row wise mean of the remaining columns, which will obviously give me 365 values in one column, and then subtracting these values from the column i left out i.e. col=1 then i want to leave out column 2
2017 Dec 14
1
match and new columns
Hi Bill,
I put stringsAsFactors = FALSE
still did not work.
tdat <- read.table(textConnection("A B C Y
A12 B03 C04 0.70
A23 B05 C06 0.05
A14 B06 C07 1.20
A25 A23 A12 3.51
A16 A25 A14 2,16"),header = TRUE ,stringsAsFactors = FALSE)
tdat$D <- 0
tdat$E <- 0
tdat$D <- (ifelse(tdat$B %in% tdat$A, tdat$A[tdat$B], 0))
tdat$E <- (ifelse(tdat$B %in% tdat$A, tdat$A[tdat$C], 0))