thr3ads.net - R help - [R] Calculating distance between words in string [Nov 2015]

If this information is useful, please help other people find it:
Share via:

Karl

2015-Nov-06 11:28 UTC

[R] Calculating distance between words in string

Hi All,

Using R for text processing is quite new to me, while I have found a lot of
useful functions and I'm beginning to learn regex, I need help with the
following task. How do I calculate the distance between words?

That is, given a specific keyword, I need to assign labels to the other
words based on the distance (number of words) to this keyword.

For example, if the keyword is "amet" and the string of words is:
 "Lorem ipsum dolor sit amet, consectetur adipiscing elit."
 -> "dolor" would get a value of -2
 -> "elit" would get a value of 3

If the sentence contains more than one instance of the keyword, I need
values for each instance. Moreover, one can assume that I can split my data
into sentences, so there is no need to search and recognize sentences (this
is a separate problem).

Thank you!

Best regards,
Jay

	[[alternative HTML version deleted]]

David Winsemius

2015-Nov-06 16:56 UTC

head link

[R] Calculating distance between words in string

> On Nov 6, 2015, at 3:28 AM, Karl <josip.2000 at gmail.com> wrote:
> 
> Hi All,
> 
> Using R for text processing is quite new to me, while I have found a lot of
> useful functions and I'm beginning to learn regex, I need help with the
> following task. How do I calculate the distance between words?
> 
> That is, given a specific keyword, I need to assign labels to the other
> words based on the distance (number of words) to this keyword.
> 
> For example, if the keyword is "amet" and the string of words is

strng <- "Lorem ipsum dolor sit amet, consectetur adipiscing elit.?
> -> "dolor" would get a value of -2
> -> "elit" would get a value of 3
words <- unlist(strsplit(strng, "\\W"))
words[words != ""]
#[1] "Lorem"       "ipsum"       "dolor"      
"sit"
#[5] "amet"        "consectetur" "adipiscing" 
"elit"
real <- words[words != ?"]

which(real == "amet")
#[1] 5
length(real)
#[1] 8
 vec <- 1:length(real) - which(real == "amet")
 names(vec) <- real

 vec["dolor"]
#dolor 
#   -2 

> #
> If the sentence contains more than one instance of the keyword, I need
> values for each instance. Moreover, one can assume that I can split my data
> into sentences, so there is no need to search and recognize sentences (this
> is a separate problem).
> 
> Thank you!
> 
> Best regards,
> Jay
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
David Winsemius
Alameda, CA, USA

S Ellison

2015-Nov-11 13:15 UTC

head link

[R] Calculating distance between words in string

> -----Original Message-----
> From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of Karl
> Subject: [R] Calculating distance between words in string
>
> .. given a specific keyword, I need to assign labels to the other words
> based on the distance (number of words) to this keyword.
> 
>...
> If the sentence contains more than one instance of the keyword, I need
values
> for each instance. 
What would you like to happen when the sentence contains more than one instance
of other words and more than one instance of both?

e.g. what output do you want from 
" amet is not the only instance of 'amet', and there is more than
one instance of 'instance', 'is', 'of' and
'and'."


S Ellison


*******************************************************************
This email and any attachments are confidential. Any use...{{dropped:8}}

Jim Lemon

2015-Nov-11 22:02 UTC

head link

[R] Calculating distance between words in string

Perhaps what you are seeking is a sparse distance matrix.

"How far is each word from every other matching word"

sentence<-"How far is each word from every other matching word"
words<-tolower(unlist(strsplit(sentence," ")))
nwords<-length(words)
wdm<-matrix(NA,nrow=nwords,ncol=nwords)
for(word in 1:nwords) {
 wordmatch<-grep(words[word],words,fixed=TRUE)
 wdm[word,wordmatch]<-wordmatch-word
}
rownames(wdm)<-colnames(wdm)<-words
wdm

The result contains zeros for a self-match, relative positions for the
desired matches and NA for non-matches.

Jim



On Thu, Nov 12, 2015 at 12:15 AM, S Ellison <S.Ellison at lgcgroup.com>
wrote:
> > -----Original Message-----
> > From: R-help [mailto:r-help-bounces at r-project.org] On Behalf Of
Karl
> > Subject: [R] Calculating distance between words in string
> >
> > .. given a specific keyword, I need to assign labels to the other
words
> > based on the distance (number of words) to this keyword.
> >
> >...
> > If the sentence contains more than one instance of the keyword, I need
> values
> > for each instance.
>
> What would you like to happen when the sentence contains more than one
> instance of other words and more than one instance of both?
>
> e.g. what output do you want from
> " amet is not the only instance of 'amet', and there is more
than one
> instance of 'instance', 'is', 'of' and
'and'."
>
>
> S Ellison
>
>
> *******************************************************************
> This email and any attachments are confidential. Any u...{{dropped:13}}

R help - Nov 2015 - Calculating distance between words in string

[R] Calculating distance between words in string

[R] Calculating distance between words in string

[R] Calculating distance between words in string

[R] Calculating distance between words in string