Bob Green
2008-Dec-24 12:16 UTC
[R] selecting a subset of a matrix based on a value occurring in 5 records
Hello,>I am hoping for some advice as to how I might create a subset of a >matrix. The matrix is 176 x 3530. The rows are individual records >and the columns words. I want to create a new matrix that only >consists of words which occur in at least 5 records. For example, >if column 7 is "charges" and this only appears in 4 records/rows >this variable would not be included, whereas if column 109 was the >word "monitor" and occurred in 95 records it would be saved into the >new matrix. Values in the matrix are numbers, such that if a word >does not occur in a record the cell contains a zero, whereas if it >occurs 7 times there is a value of 7 for that record. It is the >number of records rather than the than the column total that is the >criteria for determing inclusion into the matrix.Any suggestions on how I might reduce the size of this matrix so as to include only those columns in which a word occurs at least in 5 records is much appreciated, regards Bob
Veslot Jacques
2008-Dec-24 13:28 UTC
[R] selecting a subset of a matrix based on a value occurring in 5 records
> mat[,colSums(mat!=0)>=5]Jacques VESLOT CEMAGREF - UR Hydrobiologie Route de C?zanne - CS 40061 13182 AIX-EN-PROVENCE Cedex 5, France T?l. + 0033 04 42 66 99 76 fax + 0033 04 42 66 99 34 email jacques.veslot at cemagref.fr>-----Message d'origine----- >De?: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] De la part >de Bob Green >Envoy??: mercredi 24 d?cembre 2008 13:17 >??: r-help at r-project.org >Objet?: [R] selecting a subset of a matrix based on a value occurring in 5 records > > >Hello, > >>I am hoping for some advice as to how I might create a subset of a >>matrix. The matrix is 176 x 3530. The rows are individual records >>and the columns words. I want to create a new matrix that only >>consists of words which occur in at least 5 records. For example, >>if column 7 is "charges" and this only appears in 4 records/rows >>this variable would not be included, whereas if column 109 was the >>word "monitor" and occurred in 95 records it would be saved into the >>new matrix. Values in the matrix are numbers, such that if a word >>does not occur in a record the cell contains a zero, whereas if it >>occurs 7 times there is a value of 7 for that record. It is the >>number of records rather than the than the column total that is the >>criteria for determing inclusion into the matrix. > > >Any suggestions on how I might reduce the size of this matrix so as >to include only those columns in which a word occurs at least in 5 >records is much appreciated, > >regards > >Bob > >______________________________________________ >R-help at r-project.org mailing list >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.