Hello, I would like to have a function that retrieve matching strings in the same way as with java.util.regex (java 1.4.2). Example: f('^.*(xx?)\\.([0-9]*)$','abcxx.785') => c('xx','785') First of all: Is it possible to achiev this with grep(... perl=TRUE,value=TRUE )? As I would call this function very often with large data, I'm reluctant to use Sjava for performance reasons. Is this a wrong assumption that using Java directly would be slower or use more memory than to have a native R function? Does someone already has a solution for this :) Thanks, Marc Mamin
Gabor Grothendieck
2004-Oct-27 14:02 UTC
[R] regexp,grep: capturing more than one substring
Marc Mamin <M.Mamin <at> intershop.de> writes: : : Hello, : : I would like to have a function that retrieve matching strings in the same way as with java.util.regex (java 1.4.2). : : Example: : : f('^.*(xx?)\\.([0-9]*)$','abcxx.785') : => : c('xx','785') : : First of all: Is it possible to achiev this with grep(... perl=TRUE,value=TRUE )? Actually you don't even need perl= to do that. The function below pastes togther a string like "\\1 \\2" where n determines how many of them there are. Then it uses gsub with the regexp in r. Finally it is split into individual strings. The calculation of n, the number of backreferences, is not foolproof so you can specify your own n if your expression has parentheses that are not backreferences. Also specifying n might speed it up a bit, e.g. n = 2 in the example. The value of sep= should be a delimiter not in your string. s can be a vector of strings. It returns in a list of strings in any case, one element of the list for each element of vector s. If s is just a scalar string then it will return a one element list containing the elements as a vector. You may wish to call it like this f(...args...)[[1]] in that case as shown in the example. f <- function(r, s, n = nchar(gsub("[^(]","",r)), sep = "\10" ) { x <- gsub(r, paste("\\", 1:n, sep = "", collapse = sep), s) strsplit(x, split = sep) } f( '^.*(xx?)\\.([0-9]*)$', 'abcxx.785' )[[1]]