Hi, I have a list of 40 data.frames. I would like to identify duplicated entries in the whole list, not only in one specific data.frame, but in all 40. Here is my list:> myList[[1]] X NAME MEM.SHIP 1 FBgn0000008 FBgn0000008 0.9304502 2 FBgn0000014 FBgn0000014 1.0000000 3 FBgn0000028 FBgn0000028 1.0000000 4 FBgn0000109 FBgn0000109 1.0000000 5 FBgn0000114 FBgn0000114 0.4839886 6 FBgn0000120 FBgn0000120 1.0000000 [[2]] X NAME MEM.SHIP 1 FBgn0000251 FBgn0000251 0.3138650 2 FBgn0001168 FBgn0001168 0.8995011 3 FBgn0001941 FBgn0001941 0.7485548 4 FBgn0003053 FBgn0000028 0.4426997 5 FBgn0003159 FBgn0003159 0.4843226 6 FBgn0000120 FBgn0003162 0.6556290 I would like to know whether there are duplicated entries in the first and/or second column of all. In this list I have two duplications one is FBgn0000120 in both lines Nr. 6 and the second is FBgn0000028 in line 3 and line 4 in df1 and df2 respectively. Is there a way to do it. With unique I don't get any results. and I cannot convert the list into a data.frame, as the number of items in each df is different. Thanks Assa [[alternative HTML version deleted]]
Here is one way of doing it: ##########################> files <- list(file1 = " X NAME MEM.SHIP+ 1 FBgn0000008 FBgn0000008 0.9304502 + 2 FBgn0000014 FBgn0000014 1.0000000 + 3 FBgn0000028 FBgn0000028 1.0000000 + 4 FBgn0000109 FBgn0000109 1.0000000 + 5 FBgn0000114 FBgn0000114 0.4839886 + 6 FBgn0000120 FBgn0000120 1.0000000", + file2 = " X NAME MEM.SHIP + 1 FBgn0000251 FBgn0000251 0.3138650 + 2 FBgn0001168 FBgn0001168 0.8995011 + 3 FBgn0001941 FBgn0001941 0.7485548 + 4 FBgn0003053 FBgn0000028 0.4426997 + 5 FBgn0003159 FBgn0003159 0.4843226 + 6 FBgn0000120 FBgn0003162 0.6556290")> > # read in all the "files" (dummies in this case) > # append file name > allFiles <- do.call(rbind, lapply(names(files), function(.name){+ input <- read.table(text = files[[.name]], as.is = TRUE) + input$file <- .name + input # return value + }))> > # function to mark all duplicate entries > allDup <-+ function (value) + { + duplicated(value) | duplicated(value, fromLast = TRUE) + }> allFiles$col1 <- allDup(allFiles$X) > allFiles$col2 <- allDup(allFiles$NAME) > allFilesX NAME MEM.SHIP file col1 col2 1 FBgn0000008 FBgn0000008 0.9304502 file1 FALSE FALSE 2 FBgn0000014 FBgn0000014 1.0000000 file1 FALSE FALSE 3 FBgn0000028 FBgn0000028 1.0000000 file1 FALSE TRUE 4 FBgn0000109 FBgn0000109 1.0000000 file1 FALSE FALSE 5 FBgn0000114 FBgn0000114 0.4839886 file1 FALSE FALSE 6 FBgn0000120 FBgn0000120 1.0000000 file1 TRUE FALSE 11 FBgn0000251 FBgn0000251 0.3138650 file2 FALSE FALSE 21 FBgn0001168 FBgn0001168 0.8995011 file2 FALSE FALSE 31 FBgn0001941 FBgn0001941 0.7485548 file2 FALSE FALSE 41 FBgn0003053 FBgn0000028 0.4426997 file2 FALSE TRUE 51 FBgn0003159 FBgn0003159 0.4843226 file2 FALSE FALSE 61 FBgn0000120 FBgn0003162 0.6556290 file2 TRUE FALSE>############################################ Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. On Mon, May 19, 2014 at 8:48 AM, Assa Yeroslaviz <frymor@gmail.com> wrote:> Hi, > > I have a list of 40 data.frames. > > I would like to identify duplicated entries in the whole list, not only in > one specific data.frame, but in all 40. > > Here is my list: > > > myList > [[1]] > X NAME MEM.SHIP > 1 FBgn0000008 FBgn0000008 0.9304502 > 2 FBgn0000014 FBgn0000014 1.0000000 > 3 FBgn0000028 FBgn0000028 1.0000000 > 4 FBgn0000109 FBgn0000109 1.0000000 > 5 FBgn0000114 FBgn0000114 0.4839886 > 6 FBgn0000120 FBgn0000120 1.0000000 > > [[2]] > X NAME MEM.SHIP > 1 FBgn0000251 FBgn0000251 0.3138650 > 2 FBgn0001168 FBgn0001168 0.8995011 > 3 FBgn0001941 FBgn0001941 0.7485548 > 4 FBgn0003053 FBgn0000028 0.4426997 > 5 FBgn0003159 FBgn0003159 0.4843226 > 6 FBgn0000120 FBgn0003162 0.6556290 > > > I would like to know whether there are duplicated entries in the first > and/or second column of all. In this list I have two duplications one is > FBgn0000120 in both lines Nr. 6 and the second is FBgn0000028 in line 3 and > line 4 in df1 and df2 respectively. > > > Is there a way to do it. With unique I don't get any results. and I cannot > convert the list into a data.frame, as the number of items in each df is > different. > > Thanks > Assa > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Hi, You may try: myList <- list(structure(list(X = c("FBgn0000008", "FBgn0000014", "FBgn0000028", "FBgn0000109", "FBgn0000114", "FBgn0000120"), NAME = c("FBgn0000008", "FBgn0000014", "FBgn0000028", "FBgn0000109", "FBgn0000114", "FBgn0000120" ), MEM.SHIP = c(0.9304502, 1, 1, 1, 0.4839886, 1)), .Names = c("X", "NAME", "MEM.SHIP"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6")), structure(list(X = c("FBgn0000251", "FBgn0001168", "FBgn0001941", "FBgn0003053", "FBgn0003159", "FBgn0000120" ), NAME = c("FBgn0000251", "FBgn0001168", "FBgn0001941", "FBgn0000028", "FBgn0003159", "FBgn0003162"), MEM.SHIP = c(0.313865, 0.8995011, 0.7485548, 0.4426997, 0.4843226, 0.655629)), .Names = c("X", "NAME", "MEM.SHIP"), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))) library(data.table) ?dt1 <- rbindlist(myList) fun1 <- function(val){duplicated(val)|duplicated(val,fromLast=TRUE)} dt1[,paste0("Col",1:2):=lapply(.SD, fun1),.SDcols=1:2] dt1 A.K. On Monday, May 19, 2014 8:50 AM, Assa Yeroslaviz <frymor at gmail.com> wrote: Hi, I have a list of 40 data.frames. I would like to identify duplicated entries in the whole list, not only in one specific data.frame, but in all 40. Here is my list:> myList[[1]] ? ? ? ? ? ? X? ? ? ? NAME? MEM.SHIP 1 FBgn0000008 FBgn0000008 0.9304502 2 FBgn0000014 FBgn0000014 1.0000000 3 FBgn0000028 FBgn0000028 1.0000000 4 FBgn0000109 FBgn0000109 1.0000000 5 FBgn0000114 FBgn0000114 0.4839886 6 FBgn0000120 FBgn0000120 1.0000000 [[2]] ? ? ? ? ? ? X? ? ? ? NAME? MEM.SHIP 1 FBgn0000251 FBgn0000251 0.3138650 2 FBgn0001168 FBgn0001168 0.8995011 3 FBgn0001941 FBgn0001941 0.7485548 4 FBgn0003053 FBgn0000028 0.4426997 5 FBgn0003159 FBgn0003159 0.4843226 6 FBgn0000120 FBgn0003162 0.6556290 I would like to know whether there are duplicated entries in the first and/or second column of all. In this list I have two duplications one is FBgn0000120 in both lines Nr. 6 and the second is FBgn0000028 in line 3 and line 4 in df1 and df2 respectively. Is there a way to do it. With unique I don't get any results. and I cannot convert the list into a data.frame, as the number of items in each df is different. Thanks Assa ??? [[alternative HTML version deleted]] ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.