Dear group, kindly, ?I have a data frame, as follows: ?Measure_id i j value ? ? ?rank 1 ? ? ? ? ? 1 2 3 ? 2.0 1.0000000 2 ? ? ? ? ? 1 5 1 ? 2.0 1.0000000 3 ? ? ? ? ? 1 2 1 ? 1.5 0.7500000 4 ? ? ? ? ? 1 5 2 ? 1.5 0.7500000 5 ? ? ? ? ? 1 7 3 ? 1.5 1.0000000 6 ? ? ? ? ? 1 2 4 ? 1.0 0.5000000 7 ? ? ? ? ? 1 7 5 ? 1.0 0.6666667 8 ? ? ? ? ? 2 5 2 ? 2.5 1.0000000 9 ? ? ? ? ? 2 2 1 ? 2.0 1.0000000 10 ? ? ? ? ?2 2 4 ? 2.0 1.0000000 .. ? ? ? ?... . . ? ... ? ? ? ... I want to select distinct rows based on two coulmn (?Measure_id? and i ) for example for?Measure_id? = 1,2 ?the result would be.... 1 ? ? ? ? ? 1 2 3 ? 2.0 1.0000000 2 ? ? ? ? ? 1 5 1 ? 2.0 1.0000000 5 ? ? ? ? ? 1 7 3 ? 1.5 1.0000000 8 ? ? ? ? ? 2 5 2 ? 2.5 1.0000000 9 ? ? ? ? ?2 2 1 ? 2.0 1.0000000 kindly how I could do this? example of the data frame are followed using dput. dput(orderlist) structure(list(Measure_id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,? 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5,? 5, 5, 5), i = c(2, 5, 2, 5, 7, 2, 7, 5, 2, 2, 7, 2, 5, 7, 2,? 2, 2, 5, 5, 7, 7, 2, 5, 2, 2, 5, 7, 7, 2, 2, 5, 2, 5, 7, 7),? ? ? j = c(3, 1, 1, 2, 3, 4, 5, 2, 1, 4, 5, 3, 1, 3, 1, 3, 4,? ? ? 1, 2, 3, 5, 4, 2, 1, 3, 1, 3, 5, 1, 4, 2, 3, 1, 3, 5), value = c(2,? ? ? 2, 1.5, 1.5, 1.5, 1, 1, 2.5, 2, 2, 2, 1.5, 1.5, 1, 1, 0,? ? ? 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 1, 1, 1, 1),? ? ? rank = c(1, 1, 0.75, 0.75, 1, 0.5, 0.666666666666667, 1,? ? ? 1, 1, 1, 0.75, 0.6, 0.5, 1, 0, 0, NaN, NaN, NaN, NaN, 1,? ? ? 1, 0, 0, 0, NaN, NaN, 1, 1, 1, 0.5, 0.5, 1, 1)), class = c("grouped_df",? "tbl_df", "tbl", "data.frame"), row.names = c(NA, -35L), .Names = c("Measure_id",? "i", "j", "value", "rank"), vars = list(Measure_id), indices = list( ? ? 0:6, 7:13, 14:20, 21:27, 28:34), group_sizes = c(7L, 7L,? 7L, 7L, 7L), biggest_group_size = 7L, labels = structure(list( ? ? Measure_id = c(1, 2, 3, 4, 5)), class = "data.frame", row.names = c(NA,? -5L), .Names = "Measure_id", vars = list(Measure_id))) thanks in advance Ragia
A logical expression applied to a vector (such as a dataframe column) gives you a logical vector that you can use for selection. You can combine several of these with the & (AND) and | (OR) operator. In your case, you apparently want a range of possible values. Use the %in% operator. Consider eg. orderlist$i == 2 orderlist$i == 2 & orderlist$j < 3 orderlist$i %in% c(5, 7) Cheers, B. On Nov 29, 2015, at 10:55 PM, Ragia Ibrahim <ragia11 at hotmail.com> wrote:> Dear group, > kindly, I have a data frame, as follows: > > > Measure_id i j value rank > 1 1 2 3 2.0 1.0000000 > 2 1 5 1 2.0 1.0000000 > 3 1 2 1 1.5 0.7500000 > 4 1 5 2 1.5 0.7500000 > 5 1 7 3 1.5 1.0000000 > 6 1 2 4 1.0 0.5000000 > 7 1 7 5 1.0 0.6666667 > 8 2 5 2 2.5 1.0000000 > 9 2 2 1 2.0 1.0000000 > 10 2 2 4 2.0 1.0000000 > .. ... . . ... ... > > I want to select distinct rows based on two coulmn ( Measure_id and i ) > > for example for Measure_id = 1,2 the result would be.... > 1 1 2 3 2.0 1.0000000 > 2 1 5 1 2.0 1.0000000 > 5 1 7 3 1.5 1.0000000 > 8 2 5 2 2.5 1.0000000 > 9 2 2 1 2.0 1.0000000 > > > kindly how I could do this? > > example of the data frame are followed using dput. > > dput(orderlist) > > structure(list(Measure_id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, > 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, > 5, 5, 5), i = c(2, 5, 2, 5, 7, 2, 7, 5, 2, 2, 7, 2, 5, 7, 2, > 2, 2, 5, 5, 7, 7, 2, 5, 2, 2, 5, 7, 7, 2, 2, 5, 2, 5, 7, 7), > j = c(3, 1, 1, 2, 3, 4, 5, 2, 1, 4, 5, 3, 1, 3, 1, 3, 4, > 1, 2, 3, 5, 4, 2, 1, 3, 1, 3, 5, 1, 4, 2, 3, 1, 3, 5), value = c(2, > 2, 1.5, 1.5, 1.5, 1, 1, 2.5, 2, 2, 2, 1.5, 1.5, 1, 1, 0, > 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 1, 1, 1, 1), > rank = c(1, 1, 0.75, 0.75, 1, 0.5, 0.666666666666667, 1, > 1, 1, 1, 0.75, 0.6, 0.5, 1, 0, 0, NaN, NaN, NaN, NaN, 1, > 1, 0, 0, 0, NaN, NaN, 1, 1, 1, 0.5, 0.5, 1, 1)), class = c("grouped_df", > "tbl_df", "tbl", "data.frame"), row.names = c(NA, -35L), .Names = c("Measure_id", > "i", "j", "value", "rank"), vars = list(Measure_id), indices = list( > 0:6, 7:13, 14:20, 21:27, 28:34), group_sizes = c(7L, 7L, > 7L, 7L, 7L), biggest_group_size = 7L, labels = structure(list( > Measure_id = c(1, 2, 3, 4, 5)), class = "data.frame", row.names = c(NA, > -5L), .Names = "Measure_id", vars = list(Measure_id))) > > > > > thanks in advance > Ragia > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Ragia Ibrahim <ragia11 at hotmail.com> [Mon, Nov 30, 2015 at 04:55:08AM CET]:>Dear group, >kindly, ?I have a data frame, as follows: > > >?Measure_id i j value ? ? ?rank > >I want to select distinct rows based on two coulmn (?Measure_id? and i ) >I didn't get your example code to run but the following works for me: dfr <- data.frame(Measure_id = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5), i = c(2, 5, 2, 5, 7, 2, 7, 5, 2, 2, 7, 2, 5, 7, 2, 2, 2, 5, 5, 7, 7, 2, 5, 2, 2, 5, 7, 7, 2, 2, 5, 2, 5, 7, 7), j = c(3, 1, 1, 2, 3, 4, 5, 2, 1, 4, 5, 3, 1, 3, 1, 3, 4, 1, 2, 3, 5, 4, 2, 1, 3, 1, 3, 5, 1, 4, 2, 3, 1, 3, 5), value = c(2, 2, 1.5, 1.5, 1.5, 1, 1, 2.5, 2, 2, 2, 1.5, 1.5, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 2, 2, 2, 1, 1, 1, 1)) dfr[!duplicated(dfr[, c("Measure_id", "i")]), ] This returns Measure_id i j value 1 1 2 3 2.0 2 1 5 1 2.0 5 1 7 3 1.5 8 2 5 2 2.5 9 2 2 1 2.0 11 2 7 5 2.0 15 3 2 1 1.0 18 3 5 1 0.0 20 3 7 3 0.0 22 4 2 4 1.0 23 4 5 2 1.0 27 4 7 3 0.0 29 5 2 1 2.0 31 5 5 2 2.0 34 5 7 3 1.0 Is that what you were aiming at? Note that I had to find the solution myself, which I did, thanks to the documentation I got by ?unique, which pointed me to duplicated(). -- Johannes H?sing http://derwisch.wikidot.com Threema-ID: VHVJYH3H