Johannes Graumann
2007-Feb-02 15:32 UTC
[R] Dealing with Duplicates - How to count instances?
Hi there, given a data.frame 'data' I managed to filter out entries (rows) that are identical with respect to one column like so: duplicity <- duplicated(data[column]) data_unique <- subset(data,duplicity!=TRUE) But I'm trying to extract how many duplicates each of the remaining rows had. Can someone please send me down the right path for this? Joh
table(data[column]) will give you the number of items in each subgroup; that would be the count you are after. On 2/2/07, Johannes Graumann <johannes_graumann at web.de> wrote:> Hi there, > > given a data.frame 'data' I managed to filter out entries (rows) that are > identical with respect to one column like so: > > duplicity <- duplicated(data[column]) > data_unique <- subset(data,duplicity!=TRUE) > > But I'm trying to extract how many duplicates each of the remaining rows > had. > > Can someone please send me down the right path for this? > > Joh > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem you are trying to solve?
Johannes Graumann
2007-Feb-02 20:37 UTC
[R] Dealing with Duplicates - How to count instances?
jim holtman wrote:> table(data[column]) > > will give you the number of items in each subgroup; that would be the > count you are after.Thanks for your Help! That rocks! I can do copynum <- table(data_6plus["Accession.number"]) data_6plus$"Repeats" <- sapply(data_6plus[["Accession.number"]], function(x) copynum[x][[1]]) now! But how about this: - do something along the lines of duplicity <- duplicated(data_6plus["Accession.number"]) data_6plus_unique <- subset(data_6plus,duplicity!=TRUE) - BUT: retain from each deleted row one field, append it to a vector and fill that into a new field of the remaining row of the set sharing data_6plus["Accession.number"]? How would you do something like that? Joh