Hi all, I'm looking at a large data set, and I'm interested in removing rows where only one variable is duplicated. Here's an example:> presidentsQtr1 Qtr2 Qtr3 Qtr4 1945 NA 87 82 75 1946 63 50 43 32 1947 35 60 54 55 1948 36 39 NA NA 1949 69 57 57 51 1950 45 37 46 39 1951 36 24 32 23 1952 25 32 NA 32 1953 59 74 75 60 1954 71 61 71 57 1955 71 68 79 73 1956 76 71 67 75 1957 79 62 63 57 1958 60 49 48 52 1959 57 62 61 66 1960 71 62 61 57 1961 72 83 71 78 1962 79 71 62 74 1963 76 64 62 57 1964 80 73 69 69 1965 71 64 69 62 1966 63 46 56 44 1967 44 52 38 46 1968 36 49 35 44 1969 59 65 65 56 1970 66 53 61 52 1971 51 48 54 49 1972 49 61 NA NA 1973 68 44 40 27 1974 28 25 24 24 See how in 1954 and 1955, the Qtr1 approval rating is the same? Let's say I wanted to return the presidents data frame, but only have unique values for Qtr1. I doesn't matter which years are displayed for duplicated values-- it just matters that each value is not displayed more than once. Any way I can do this but still have it be a data frame that shows Qtr2, 3, and 4 values? Thanks in advance, Andrew -- View this message in context: http://r.789695.n4.nabble.com/Unique-subsetting-question-tp2550453p2550453.html Sent from the R help mailing list archive at Nabble.com.
Hi, Take a look at ?duplicated and ?unique HTH, Ivan Le 9/22/2010 16:55, AndrewPage a écrit :> Hi all, > > I'm looking at a large data set, and I'm interested in removing rows where > only one variable is duplicated. Here's an example: > >> presidents > Qtr1 Qtr2 Qtr3 Qtr4 > 1945 NA 87 82 75 > 1946 63 50 43 32 > 1947 35 60 54 55 > 1948 36 39 NA NA > 1949 69 57 57 51 > 1950 45 37 46 39 > 1951 36 24 32 23 > 1952 25 32 NA 32 > 1953 59 74 75 60 > 1954 71 61 71 57 > 1955 71 68 79 73 > 1956 76 71 67 75 > 1957 79 62 63 57 > 1958 60 49 48 52 > 1959 57 62 61 66 > 1960 71 62 61 57 > 1961 72 83 71 78 > 1962 79 71 62 74 > 1963 76 64 62 57 > 1964 80 73 69 69 > 1965 71 64 69 62 > 1966 63 46 56 44 > 1967 44 52 38 46 > 1968 36 49 35 44 > 1969 59 65 65 56 > 1970 66 53 61 52 > 1971 51 48 54 49 > 1972 49 61 NA NA > 1973 68 44 40 27 > 1974 28 25 24 24 > > See how in 1954 and 1955, the Qtr1 approval rating is the same? Let's say I > wanted to return the presidents data frame, but only have unique values for > Qtr1. I doesn't matter which years are displayed for duplicated values-- it > just matters that each value is not displayed more than once. Any way I can > do this but still have it be a data frame that shows Qtr2, 3, and 4 values? > > Thanks in advance, > Andrew-- Ivan CALANDRA PhD Student University of Hamburg Biozentrum Grindel und Zoologisches Museum Abt. Säugetiere Martin-Luther-King-Platz 3 D-20146 Hamburg, GERMANY +49(0)40 42838 6231 ivan.calandra@uni-hamburg.de ********** http://www.for771.uni-bonn.de http://webapp5.rrz.uni-hamburg.de/mammals/eng/mitarbeiter.php [[alternative HTML version deleted]]
Hi Andrew, Perhaps you did not notice my previous email. The answer is still the same (see below): On Wed, Sep 22, 2010 at 1:48 PM, AndrewPage <savejarvis at yahoo.com> wrote:> > How about this: > > > s = c("aa", "bb", "cc", "", "aa", "dd", "", "aa") > > n = c(2, 3, 5, 6, 7, 8, 9, 3) > > b = c(TRUE, FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, FALSE) > > df = data.frame(n, s, b) ? ? ? # df is a data frame > > > I want to display df with no value in s occurring more than once.df <- df[!duplicated(df$s),] Also, I> want to delete the rows where s contains "".Same idea here: df[s != "",] -Ista> -- > View this message in context: http://r.789695.n4.nabble.com/Unique-subsetting-question-tp2550453p2550769.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Ista Zahn Graduate student University of Rochester Department of Clinical and Social Psychology http://yourpsyche.org