thr3ads.net - R help - [R] Help with text separation [Nov 2011]

If this information is useful, please help other people find it:
Share via:

Michael Griffiths

2011-Nov-14 09:20 UTC

[R] Help with text separation

Good morning R list,

My apologies if this has *already* answered elsewhere, but I have not found
the answer that I am looking for.

I have a character string, i.e.


form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L *
M')

Now, my aim is to find the position of all those instances of '*' and to
remove said '*'. However, I would also like to remove the preceding
variable name before the '*', the math operator preceding this, and also
the variable name after the '*'. So, here I would like to remove
'+L*M'

So, far I have come up with the following code:

parts<-strsplit(form,' ')
index<-which(unlist(parts)=="*")
for (i in 1:length(index)){
    parts[[1]][index[i]]<-list(NULL)
    parts[[1]][index[i]+1]<-list(NULL)
    parts[[1]][index[i]-1]<-list(NULL)
    parts[[1]][index[i]-2]<-list(NULL)
}
new.form<-unlist(parts)

form<-new.form[0]
for (i in 1: length(new.form)){
    form<-paste(form,new.form[i], sep="")
}

However, as you can see, I have had to use strsplit in, what I consider a
rather clumsy manner, as the character string (form) has to be in a certain
format. All variables and maths operators require a space between them in
order for strsplit to work in the manner I require.

I would very much like to accomplish what the above code already does, but
without the need for the initial character string having the need for the
aforementioned spaces.

If the list can offer help, I would be most appreciative.

Yours

Mike Griffiths




-- 

*Michael Griffiths, Ph.D
*Statistician

*Upstream Systems*

8th Floor
Portland House
Bressenden Place
SW1E 5BH

<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw>

Tel   +44 (0) 20 7869 5147
Fax  +44 207 290 1321
Mob +44 789 4944 145

www.upstreamsystems.com<http://www.google.com/url?q=http%3A%2F%2Fwww.upstreamsystems.com%2F&sa=D&sntz=1&usg=AFrqEzfKYfaAalqvahwrpywpJDL9DxUmWw>

*griffiths@upstreamsystems.com <einstein@upstreamsystems.com>*

<http://www.upstreamsystems.com/>

	[[alternative HTML version deleted]]

Sarah Goslee

2011-Nov-14 12:09 UTC

head link

[R] Help with text separation

Hi,

On Mon, Nov 14, 2011 at 4:20 AM, Michael Griffiths
<griffiths at upstreamsystems.com> wrote:> Good morning R list,
>
> My apologies if this has *already* answered elsewhere, but I have not found
> the answer that I am looking for.
>
> I have a character string, i.e.
>
>
> form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L *
M')
>
> Now, my aim is to find the position of all those instances of '*'
and to
> remove said '*'. However, I would also like to remove the preceding
> variable name before the '*', the math operator preceding this, and
also
> the variable name after the '*'. So, here I would like to remove
'+L*M'
You just want to get rid of them? gsub() it is.

I've changed your formula a little bit to better demonstrate what's
going on:> form<-c('~ A + B * C + C / D + E + E / F * G + H + I + J + K + L *
M')
> gsub(" \\+ [A-Z] \\* [A-Z]", "", form)[1] "~ A + C / D + E + E / F * G + H + I + J + K"

That regular expression will take out a
space
+
any capital letter
space
*
space
any capital letter.

It will take out all occurrences of that sequence, but won't take out
occurrences of * not in that sequence.

If you don't want the spaces, you don't need them. Just take them out
of the regular expression as well.

Not that strsplit() was remotely the right tool here, but you can
split into characters without a separator:> form <- 'abcd'
> strsplit(form, '')[[1]]
[1] "a" "b" "c" "d"

Sarah
> So, far I have come up with the following code:
>
> parts<-strsplit(form,' ')
> index<-which(unlist(parts)=="*")
> for (i in 1:length(index)){
> ? ?parts[[1]][index[i]]<-list(NULL)
> ? ?parts[[1]][index[i]+1]<-list(NULL)
> ? ?parts[[1]][index[i]-1]<-list(NULL)
> ? ?parts[[1]][index[i]-2]<-list(NULL)
> }
> new.form<-unlist(parts)
>
> form<-new.form[0]
> for (i in 1: length(new.form)){
> ? ?form<-paste(form,new.form[i], sep="")
> }
>
> However, as you can see, I have had to use strsplit in, what I consider a
> rather clumsy manner, as the character string (form) has to be in a certain
> format. All variables and maths operators require a space between them in
> order for strsplit to work in the manner I require.
>
> I would very much like to accomplish what the above code already does, but
> without the need for the initial character string having the need for the
> aforementioned spaces.
>
> If the list can offer help, I would be most appreciative.
>
> Yours
>
> Mike Griffiths
>
>
>-- 
Sarah Goslee
http://www.functionaldiversity.org

David Winsemius

2011-Nov-14 17:05 UTC

head link

[R] Help with text separation

On Nov 14, 2011, at 4:20 AM, Michael Griffiths wrote:
> Good morning R list,
>
> My apologies if this has *already* answered elsewhere, but I have  
> not found
> the answer that I am looking for.
>
> I have a character string, i.e.
>
>
> form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L *
M')
>
> Now, my aim is to find the position of all those instances of '*'  
> and to
> remove said '*'. However, I would also like to remove the preceding
> variable name before the '*', the math operator preceding this, and
> also
> the variable name after the '*'. So, here I would like to remove  
> '+L*M'
This would be a very narrow implementation that requires the +/spc/ 
alnum/spc/*/alnum sequence exactly;

 > sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]*", "",
form)
[1] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "

This is a more general implementation using the "*" operator that  
matches each of the preceding item 0 or more times.

  form<-c('~ A + B + C + C / D + E + E / F + G + H + I + J + K + L *
M',
  '~ A + B + C + C / D + E + E / F + G + H + I + J + K + L*M',
   '~ A + B + C + C / D + E + E / F + G + H + I + J + K +Llll*M'
  )
 > sub("\\+*\\s*[[:alnum:]]*\\s*\\*.[[:alnum:]]*", "",
form)
[1] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "
[2] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "
[3] "~ A + B + C + C / D + E + E / F + G + H + I + J + K "


---stripped out code---

-- 
David Winsemius, MD
West Hartford, CT

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Nov 2011 - Help with text separation

[R] Help with text separation

[R] Help with text separation

[R] Help with text separation

Possibly Parallel Threads