Dear R Team, I am a new R user and I am currently trying to subset my data under a special condition. I have went through several pages of the subsetting section here on the forum, but I was not able to find an answer. My data is as follows: ID NAME MS Pol. Party 1 John x F 2 Mary s S 3 Katie x O 4 Sarah p L 5 Martin x O 6 Angelika x F 7 Smith x O .... I am intested in only those observations, where there are at least three members of 1 political party. That is, I need to throw out all cases in the example above, except for members of party "O". Would really appreciate your help. K -- View this message in context: http://r.789695.n4.nabble.com/subsetting-with-condition-tp3567193p3567193.html Sent from the R help mailing list archive at Nabble.com.
On Jun 1, 2011, at 7:00 PM, kristina p wrote:> Dear R Team, > > I am a new R user and I am currently trying to subset my data under a > special condition. I have went through several pages of the subsetting > section here on the forum, but I was not able to find an answer. > > My data is as follows: > > ID NAME MS Pol. Party > 1 John x F > 2 Mary s S > 3 Katie x O > 4 Sarah p L > 5 Martin x O > 6 Angelika x F > 7 Smith x O > ....Assume this is in a dataframe, 'pol', and that you have corrected the error in colnames, so that it is Pol_Party. the ave function is particularly useful when you need to have a vector that "lines up along side" the other columns pol[ave(seq_along(pol$ID), pol$Pol_Party, FUN=length) >= 3 , ] ID NAME MS Pol_Party 3 3 Katie x O 5 5 Martin x O 7 7 Smith x O (The use of seq_along ensures you will get duplicates of ID that are in any qualifying Parties. Another way to generate the values would be to table()-ulate and pick out the names of qualifying Parties: > pol[ pol$Pol_Party %in% names(tabl.party)[tabl.party >= 3], ] ID NAME MS Pol_Party 3 3 Katie x O 5 5 Martin x O 7 7 Smith x O> I am intested in only those observations, where there are at least > three > members of 1 political party. That is, I need to throw out all cases > in the > example above, except for members of party "O".Both methods use logical indexing with the "[.data.frame" function,> > Would really appreciate your help.-- David Winsemius, MD West Hartford, CT
Try this: subset(x, ave(x$ID, x$Pol., FUN = length) >= 3) On Wed, Jun 1, 2011 at 8:00 PM, kristina p <puzarina.k at gmail.com> wrote:> Dear R Team, > > I am a new R user and I am currently trying to subset my data under a > special condition. I have went through several pages of the subsetting > section here on the forum, but I was not able to find an answer. > > My data is as follows: > > ?ID ? ? ? ? ? ? ? ? ? ? ?NAME ? ? ? MS ? ? Pol. Party > 1 ? ? ? ? ? ? ? ? ? ? ? ? ? John ? ? ? x ? ? ? F > 2 ? ? ? ? ? ? ? ? ? ? ? ? ? Mary ? ? ? s ? ? ? S > 3 ? ? ? ? ? ? ? ? ? ? ? ? ? Katie ? ? ?x ? ? ? O > 4 ? ? ? ? ? ? ? ? ? ? ? ? ? Sarah ? ? ?p ? ? ? L > 5 ? ? ? ? ? ? ? ? ? ? ? ? ? Martin ? ? ?x ? ? ?O > 6 ? ? ? ? ? ? ? ? ? ? ? ? ? Angelika ? x ? ? ?F > 7 ? ? ? ? ? ? ? ? ? ? ? ? ? ?Smith ? ? ?x ? ? ?O > .... > > I am intested in only those observations, where there are at least three > members of 1 political party. That is, I need to throw out all cases in the > example above, except for members of party "O". > > Would really appreciate your help. > K > > -- > View this message in context: http://r.789695.n4.nabble.com/subsetting-with-condition-tp3567193p3567193.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Henrique Dallazuanna Curitiba-Paran?-Brasil 25? 25' 40" S 49? 16' 22" O
Kristina: You posed your question nicely, but it would help R HelperRs if you used dput() to post your data for us to more easily copy and paste into R in future. Anyway, there are probably about a million ways to do this (see especially the ddply package for organizing data), but one basic approach is to use table() to count Pol. parties (a bad name for a variable, btw, as the space requires backtick quoting) and then use the names attribute of the result to identify the parties you want. i.e. tbl <- table(polParties) names(tbl[tbl>3]) ## gives the names of polParties with > 3 entries. Then use subset (or indexing) on these with %in% etc. -- Bert On Wed, Jun 1, 2011 at 4:00 PM, kristina p <puzarina.k at gmail.com> wrote:> Dear R Team, > > I am a new R user and I am currently trying to subset my data under a > special condition. I have went through several pages of the subsetting > section here on the forum, but I was not able to find an answer. > > My data is as follows: > > ?ID ? ? ? ? ? ? ? ? ? ? ?NAME ? ? ? MS ? ? Pol. Party > 1 ? ? ? ? ? ? ? ? ? ? ? ? ? John ? ? ? x ? ? ? F > 2 ? ? ? ? ? ? ? ? ? ? ? ? ? Mary ? ? ? s ? ? ? S > 3 ? ? ? ? ? ? ? ? ? ? ? ? ? Katie ? ? ?x ? ? ? O > 4 ? ? ? ? ? ? ? ? ? ? ? ? ? Sarah ? ? ?p ? ? ? L > 5 ? ? ? ? ? ? ? ? ? ? ? ? ? Martin ? ? ?x ? ? ?O > 6 ? ? ? ? ? ? ? ? ? ? ? ? ? Angelika ? x ? ? ?F > 7 ? ? ? ? ? ? ? ? ? ? ? ? ? ?Smith ? ? ?x ? ? ?O > .... > > I am intested in only those observations, where there are at least three > members of 1 political party. That is, I need to throw out all cases in the > example above, except for members of party "O". > > Would really appreciate your help. > K > > -- > View this message in context: http://r.789695.n4.nabble.com/subsetting-with-condition-tp3567193p3567193.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- "Men by nature long to get on to the ultimate truths, and will often be impatient with elementary studies or fight shy of them. If it were possible to reach the ultimate truths without the elementary studies usually prefixed to them, these would not be preparatory studies but superfluous diversions." -- Maimonides (1135-1204) Bert Gunter Genentech Nonclinical Biostatistics 467-7374 http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm
Thank you all, tried all options and it gives me exactly what I needed! Many many thanks again) to Bert, oh, I see, yes, next time I will do that. Kristina -- View this message in context: http://r.789695.n4.nabble.com/subsetting-with-condition-tp3567193p3567645.html Sent from the R help mailing list archive at Nabble.com.