Good morning R list, My apologies if this has *already* answered elsewhere, but I have not found the answer that I am looking for. I have a character string, i.e. form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L * M') Now, my aim is to find the position of all those instances of '*' and to remove said '*'. However, I would also like to remove the preceding variable name before the '*', the math operator preceding this, and also the variable name after the '*'. So, here I would like to remove '+L*M' So, far I have come up with the following code: parts<-strsplit(form,' ') index<-which(unlist(parts)=="*") for (i in 1:length(index)){ parts[[1]][index[i]]<-list(NULL) parts[[1]][index[i]+1]<-list(NULL) parts[[1]][index[i]-1]<-list(NULL) parts[[1]][index[i]-2]<-list(NULL) } new.form<-unlist(parts) form<-new.form[0] for (i in 1: length(new.form)){ form<-paste(form,new.form[i], sep="") } However, as you can see, I have had to use strsplit in, what I consider a rather clumsy manner, as the character string (form) has to be in a certain format. All variables and maths operators require a space between them in order for strsplit to work in the manner I require. I would very much like to accomplish what the above code already does, but without the need for the initial character string having the need for the aforementioned spaces. If the list can offer help, I would be most appreciative. Yours Mike Griffiths -- *Michael Griffiths, Ph.D *Statistician *Upstream Systems* 8th Floor Portland House Bressenden Place SW1E 5BH <http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw> Tel +44 (0) 20 7869 5147 Fax +44 207 290 1321 Mob +44 789 4944 145 www.upstreamsystems.com<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw> *griffiths@upstreamsystems.com <einstein@upstreamsystems.com>* <http://www.upstreamsystems.com/> [[alternative HTML version deleted]]
Hi, On Mon, Nov 14, 2011 at 4:20 AM, Michael Griffiths <griffiths at upstreamsystems.com> wrote:> Good morning R list, > > My apologies if this has *already* answered elsewhere, but I have not found > the answer that I am looking for. > > I have a character string, i.e. > > > form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L * M') > > Now, my aim is to find the position of all those instances of '*' and to > remove said '*'. However, I would also like to remove the preceding > variable name before the '*', the math operator preceding this, and also > the variable name after the '*'. So, here I would like to remove '+L*M'You just want to get rid of them? gsub() it is. I've changed your formula a little bit to better demonstrate what's going on:> form<-c('~ A + B * C + C / D + E + E / F * G + H + I + J + K + L * M') > gsub(" \\+ [A-Z] \\* [A-Z]", "", form)[1] "~ A + C / D + E + E / F * G + H + I + J + K" That regular expression will take out a space + any capital letter space * space any capital letter. It will take out all occurrences of that sequence, but won't take out occurrences of * not in that sequence. If you don't want the spaces, you don't need them. Just take them out of the regular expression as well. Not that strsplit() was remotely the right tool here, but you can split into characters without a separator:> form <- 'abcd' > strsplit(form, '')[[1]] [1] "a" "b" "c" "d" Sarah> So, far I have come up with the following code: > > parts<-strsplit(form,' ') > index<-which(unlist(parts)=="*") > for (i in 1:length(index)){ > ? ?parts[[1]][index[i]]<-list(NULL) > ? ?parts[[1]][index[i]+1]<-list(NULL) > ? ?parts[[1]][index[i]-1]<-list(NULL) > ? ?parts[[1]][index[i]-2]<-list(NULL) > } > new.form<-unlist(parts) > > form<-new.form[0] > for (i in 1: length(new.form)){ > ? ?form<-paste(form,new.form[i], sep="") > } > > However, as you can see, I have had to use strsplit in, what I consider a > rather clumsy manner, as the character string (form) has to be in a certain > format. All variables and maths operators require a space between them in > order for strsplit to work in the manner I require. > > I would very much like to accomplish what the above code already does, but > without the need for the initial character string having the need for the > aforementioned spaces. > > If the list can offer help, I would be most appreciative. > > Yours > > Mike Griffiths > > >-- Sarah Goslee http://www.functionaldiversity.org
On Nov 14, 2011, at 4:20 AM, Michael Griffiths wrote:> Good morning R list, > > My apologies if this has *already* answered elsewhere, but I have > not found > the answer that I am looking for. > > I have a character string, i.e. > > > form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L * M') > > Now, my aim is to find the position of all those instances of '*' > and to > remove said '*'. However, I would also like to remove the preceding > variable name before the '*', the math operator preceding this, and > also > the variable name after the '*'. So, here I would like to remove > '+L*M'This would be a very narrow implementation that requires the +/spc/ alnum/spc/*/alnum sequence exactly; > sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]*", "", form) [1] "~ A + B + C + C / D + E + E / F + G + H + I + J + K " This is a more general implementation using the "*" operator that matches each of the preceding item 0 or more times. form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L * M', '~ A + B + C + C / D + E + E / F + G + H + I + J + K + L*M', '~ A + B + C + C / D + E + E / F + G + H + I + J + K +Llll*M' ) > sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]*", "", form) [1] "~ A + B + C + C / D + E + E / F + G + H + I + J + K " [2] "~ A + B + C + C / D + E + E / F + G + H + I + J + K " [3] "~ A + B + C + C / D + E + E / F + G + H + I + J + K " ---stripped out code--- -- David Winsemius, MD West Hartford, CT