I have a data frame of ~200 columns and ~20,000 rows where each column consists of binary responses (0,1) and a 9 for missing data. I am interested in finding the columns for which there are fewer than 100 individuals with responses of 0. I can use an apply function to generate a table for each column, but I'm not certain whether I can subset a list based on some criterion as subset() is designed for vectors, matrices or dataframes. For example, I can use the following: tt <- apply(data, 2, table) Which returns an object of class list. Here is some sample output from tt $R0235940b 0 1 9 2004 1076 15361 $R0000710a 0 9 2 18439 $R0000710b 0 1 9 3333 3941 11167 tt$R0000710a meets my criteria and I would want to be able to easily find this instead of rolling through the entire output. Is there a way to subset this list to identify the columns which meet the criteria I note above? Thanks, Harold [[alternative HTML version deleted]]
On Mon, 2006-05-22 at 17:55 -0400, Doran, Harold wrote:> I have a data frame of ~200 columns and ~20,000 rows where each column > consists of binary responses (0,1) and a 9 for missing data. I am > interested in finding the columns for which there are fewer than 100 > individuals with responses of 0. > > I can use an apply function to generate a table for each column, but I'm > not certain whether I can subset a list based on some criterion as > subset() is designed for vectors, matrices or dataframes. > > For example, I can use the following: > tt <- apply(data, 2, table) > > Which returns an object of class list. Here is some sample output from > tt > > $R0235940b > > 0 1 9 > 2004 1076 15361 > > $R0000710a > > 0 9 > 2 18439 > > $R0000710b > > 0 1 9 > 3333 3941 11167 > > tt$R0000710a meets my criteria and I would want to be able to easily > find this instead of rolling through the entire output. Is there a way > to subset this list to identify the columns which meet the criteria I > note above? > > > Thanks, > HaroldHarold, How about this:> DFV1 V2 V3 V4 V5 1 0 1 0 1 0 2 0 0 1 0 1 3 0 0 1 1 0 4 1 1 0 0 1 5 1 1 1 1 0 6 0 1 0 1 1 7 0 1 1 1 0 8 0 1 0 0 0 9 0 0 1 1 0 10 1 0 0 1 1 # Find the columns with <5 0's> which(sapply(DF, function(x) sum(x == 0)) < 5)V2 V4 2 4 So in your case, just replace the DF with your data frame name and the 5 with 100. HTH, Marc Schwartz
Doran, Harold wrote:> I have a data frame of ~200 columns and ~20,000 rows where each column > consists of binary responses (0,1) and a 9 for missing data. I am > interested in finding the columns for which there are fewer than 100 > individuals with responses of 0. > > I can use an apply function to generate a table for each column, but I'm > not certain whether I can subset a list based on some criterion as > subset() is designed for vectors, matrices or dataframes. > > For example, I can use the following: > tt <- apply(data, 2, table) > > Which returns an object of class list. Here is some sample output from > tt > > $R0235940b > > 0 1 9 > 2004 1076 15361 > > $R0000710a > > 0 9 > 2 18439 > > $R0000710b > > 0 1 9 > 3333 3941 11167 > > tt$R0000710a meets my criteria and I would want to be able to easily > find this instead of rolling through the entire output. Is there a way > to subset this list to identify the columns which meet the criteria I > note above?How about this? newdf <- mydf[,colSums(mydf == 0) < 100]> Thanks, > Harold > > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Chuck Cleland, Ph.D. NDRI, Inc. 71 West 23rd Street, 8th floor New York, NY 10010 tel: (212) 845-4495 (Tu, Th) tel: (732) 512-0171 (M, W, F) fax: (917) 438-0894