Hi everybody, I have a little problem in my R-code which seems be easy to solve, but I wasn't able to find the solution by myself for the moment. Here's an example of the form of my data: data <- data.frame(col1=c("a","a","b","b"),col2=c(1,1,2,2),col3=c(NA,"ST001","ST002",NA)) I would like to remove duplicated data based on the first two columns (col1,col2), but in both cases here, I would like to remove the duplicated row which is equal to NA in col3. Here's the data.frame I would like to obtain: data2 <- data.frame(col1=c("a","b"),col2=c(1,2),col3=c("ST001","ST002")) I've been trying to mix duplicated() with is.na() but it doesn't work yet. Can someone tell me the best and easiest way to do this? Thanks a lot! -- View this message in context: http://r.789695.n4.nabble.com/remove-duplicated-row-according-to-NA-condition-tp4691362.html Sent from the R help mailing list archive at Nabble.com.
Hi! How about trying this: data[ data$col1!=data$col2 & !is.na(data$col3), ] col1 col2 col3 2 a 1 ST001 3 b 2 ST002 HTH, Kimmo 28.05.2014 15:35, jeff6868 wrote:> Hi everybody, > > I have a little problem in my R-code which seems be easy to solve, but I > wasn't able to find the solution by myself for the moment. > > Here's an example of the form of my data: > > data <- > data.frame(col1=c("a","a","b","b"),col2=c(1,1,2,2),col3=c(NA,"ST001","ST002",NA)) > > I would like to remove duplicated data based on the first two columns > (col1,col2), but in both cases here, I would like to remove the duplicated > row which is equal to NA in col3. > > Here's the data.frame I would like to obtain: > > data2 <- data.frame(col1=c("a","b"),col2=c(1,2),col3=c("ST001","ST002")) > > I've been trying to mix duplicated() with is.na() but it doesn't work yet. > > Can someone tell me the best and easiest way to do this? > > Thanks a lot! > > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/remove-duplicated-row-according-to-NA-condition-tp4691362.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
It would help if you said what you want done when none or all or some of the col1-col2 duplicates have NA's in the col3. E.g., what do you want the function to do for the following input?> data2 <- data.frame(col1=c("a","a","a","b","b","c","c","d","d","e"),col2=c(1,1,1,2,2,3,3,4,4,5), col3=c("A1",NA,"A3",NA,"B2","C1","C2",NA,NA,NA))> data2col1 col2 col3 1 a 1 A1 2 a 1 <NA> 3 a 1 A3 4 b 2 <NA> 5 b 2 B2 6 c 3 C1 7 c 3 C2 8 d 4 <NA> 9 d 4 <NA> 10 e 5 <NA> (You may want it to return a data.frame or you may want the function to stop because the data is not considered legal, but you should decide what it should do.) Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, May 28, 2014 at 5:35 AM, jeff6868 <geoffrey_klein at etu.u-bourgogne.fr> wrote:> Hi everybody, > > I have a little problem in my R-code which seems be easy to solve, but I > wasn't able to find the solution by myself for the moment. > > Here's an example of the form of my data: > > data <- > data.frame(col1=c("a","a","b","b"),col2=c(1,1,2,2),col3=c(NA,"ST001","ST002",NA)) > > I would like to remove duplicated data based on the first two columns > (col1,col2), but in both cases here, I would like to remove the duplicated > row which is equal to NA in col3. > > Here's the data.frame I would like to obtain: > > data2 <- data.frame(col1=c("a","b"),col2=c(1,2),col3=c("ST001","ST002")) > > I've been trying to mix duplicated() with is.na() but it doesn't work yet. > > Can someone tell me the best and easiest way to do this? > > Thanks a lot! > > > > > > > > -- > View this message in context: http://r.789695.n4.nabble.com/remove-duplicated-row-according-to-NA-condition-tp4691362.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi, May be this helps: data1 <- data[with(data, order(col1, col2,1*is.na(col3))),] ?data1[!duplicated(data1[,1:2]),] A.K. On Wednesday, May 28, 2014 11:28 AM, jeff6868 <geoffrey_klein at etu.u-bourgogne.fr> wrote: Hi everybody, I have a little problem in my R-code which seems be easy to solve, but I wasn't able to find the solution by myself for the moment. Here's an example of the form of my data: data <- data.frame(col1=c("a","a","b","b"),col2=c(1,1,2,2),col3=c(NA,"ST001","ST002",NA)) I would like to remove duplicated data based on the first two columns (col1,col2), but in both cases here, I would like to remove the duplicated row which is equal to NA in col3. Here's the data.frame I would like to obtain: data2 <- data.frame(col1=c("a","b"),col2=c(1,2),col3=c("ST001","ST002")) I've been trying to mix duplicated() with is.na() but it doesn't work yet. Can someone tell me the best and easiest way to do this? Thanks a lot! -- View this message in context: http://r.789695.n4.nabble.com/remove-duplicated-row-according-to-NA-condition-tp4691362.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.