hi netters, i have a dataframe A with several columns(variables). the elements of column M are character strings. so A$M=c("ab","abc","bcd","ac","abcd","fg",....."fl"). i wanna extract all the rows where A$M match some regular expression pattern. for a simple example, let the pattern be just "ab", i wanna subset the rows where A$M="ab" or "abc" or "abcd" or "abXX"......... i know i can write a loop,using some regular expression pattern functions like grep row by row. but when A's size is pretty large, it's inefficient. could anyone give me a hint about a faster code? thanks a lot!
Peter Dalgaard
2005-Dec-03 11:59 UTC
[R] how to subset rows using regular expression patterns
"zhihua li" <lzhtom at hotmail.com> writes:> hi netters, > > i have a dataframe A with several columns(variables). the elements of > column M are character strings. so > A$M=c("ab","abc","bcd","ac","abcd","fg",....."fl"). > > i wanna extract all the rows where A$M match some regular expression > pattern. > for a simple example, let the pattern be just "ab", i wanna subset the > rows where A$M="ab" or "abc" or "abcd" or "abXX"......... > > i know i can write a loop,using some regular expression pattern > functions like grep row by row. but when A's size is pretty large, > it's inefficient. could anyone give me a hint about a faster code? > > thanks a lot!Notice that grep() returns an index vector, so A[grep(pattern, A$M),] or subset(A, grep(pattern, M)) should do it. -- O__ ---- Peter Dalgaard ??ster Farimagsgade 5, Entr.B c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K (*) \(*) -- University of Copenhagen Denmark Ph: (+45) 35327918 ~~~~~~~~~~ - (p.dalgaard at biostat.ku.dk) FAX: (+45) 35327907
paul sorenson
2005-Dec-04 01:20 UTC
[R] how to subset rows using regular expression patterns
Something like A[grep('^ab', as.vector(A$M)),] might work zhihua li wrote:> hi netters, > > i have a dataframe A with several columns(variables). the elements of > column M are character strings. so > A$M=c("ab","abc","bcd","ac","abcd","fg",....."fl"). > > i wanna extract all the rows where A$M match some regular expression > pattern. > for a simple example, let the pattern be just "ab", i wanna subset the > rows where A$M="ab" or "abc" or "abcd" or "abXX"......... > > i know i can write a loop,using some regular expression pattern > functions like grep row by row. but when A's size is pretty large, it's > inefficient. could anyone give me a hint about a faster code? > > thanks a lot!
Apparently Analagous Threads
- An R clause to bind dataframes under certain contions
- Error: evaluation nested too deeply when doing heatmap with binary distfunction
- locate the rows in a dataframe with some criteria
- Comparison of aggregate in R and group by in mysql
- quotient and remainder