thr3ads.net - R help - [R] Select elements from text [Jan 2012]

If this information is useful, please help other people find it:
Share via:

mdvaan

2012-Jan-24 14:52 UTC

[R] Select elements from text

Hi,

I have a series of MS word files and each file contains plain text. From
these texts I would like to extract only those elements (read: words) that
are between square brackets. Example of a text:

Most fundamentally, it has led to an effort to clarify the organizational
form concept. According to them [see also Smith, Jones and Carroll 2002],
categories emerge as audience members recognize dissimilarities among groups
of consumers and label them as members of a common set [Nicol 2000].

Now I would like to get the following selection:

see also Smith, Jones and Carroll 2002
Nicol 2000

Any ideas on how to do this? What would be the best way to import the text
in R? The entire text as an element in a dataframe? Thank you very much!

Best,

Mathijs


--
View this message in context:
http://r.789695.n4.nabble.com/Select-elements-from-text-tp4323947p4323947.html
Sent from the R help mailing list archive at Nabble.com.

Justin Haynes

2012-Jan-24 15:16 UTC

head link

[R] Select elements from text

how bout using read.table(... , sep=" ").

That would give you a vector of single words.  then

grepl("\\[[9-z]+\\]",x)

will return a boolean vector

>
x<-c('test','[bracket]','hi]','[blah','foo','[bar]')
> grepl('\\[[9-z]+\\]',x)
[1] FALSE  TRUE FALSE FALSE FALSE  TRUE> x[grepl('\\[[9-z]+\\]',x)][1] "[bracket]" "[bar]"

You might need a more complex reg-ex to catch them all incase of
([citation]) instances for example.

Justin

On Tue, Jan 24, 2012 at 6:52 AM, mdvaan <mathijsdevaan@gmail.com> wrote:
> Hi,
>
> I have a series of MS word files and each file contains plain text. From
> these texts I would like to extract only those elements (read: words) that
> are between square brackets. Example of a text:
>
> Most fundamentally, it has led to an effort to clarify the organizational
> form concept. According to them [see also Smith, Jones and Carroll 2002],
> categories emerge as audience members recognize dissimilarities among
> groups
> of consumers and label them as members of a common set [Nicol 2000].
>
> Now I would like to get the following selection:
>
> see also Smith, Jones and Carroll 2002
> Nicol 2000
>
> Any ideas on how to do this? What would be the best way to import the text
> in R? The entire text as an element in a dataframe? Thank you very much!
>
> Best,
>
> Mathijs
>
>
> --
> View this message in context:
>
http://r.789695.n4.nabble.com/Select-elements-from-text-tp4323947p4323947.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

mdvaan

2012-Jan-24 20:41 UTC

head link

[R] Select elements from text

Thanks for the quick response. I get the latter part, but reading the text
from MS word into R is problematic. I am able to read in (scan) all unique
elements (following sep=" ") from the text, but unable to past
everything
together again. Any id on how to solve this? It looks like this now: 

text<-scan("test.txt", character(0), sep = " ")
> text [1] "Most"            "fundamentally,"  "it"     
"has"
 [5] "led"             "to"              "an"     
"effort"
 [9] "to"              "clarify"         "the"    
"organizational"
[13] "form"            "concept."       
"According"       "to"
[17] "them"            "[see"            "also"   
"Smith,"
[21] "Jones"           "and"             "Carroll"
"2002],"
[25] "categories"      "emerge"          "as"     
"audience"
[29] "members"         "recognize"      
"dissimilarities" "among"
[33] "groups"          "of"             
"consumers"       "and"
[37] "label"           "them"            "as"     
"members"
[41] "of"              "a"               "common" 
"set"
[45] "[Nicol"          "2000]."       

--
View this message in context:
http://r.789695.n4.nabble.com/Select-elements-from-text-tp4323947p4325174.html
Sent from the R help mailing list archive at Nabble.com.

R. Michael Weylandt

2012-Jan-25 00:40 UTC

head link

[R] Select elements from text

paste(text, collapse = " ")

Michael

On Tue, Jan 24, 2012 at 3:41 PM, mdvaan <mathijsdevaan at gmail.com>
wrote:> Thanks for the quick response. I get the latter part, but reading the text
> from MS word into R is problematic. I am able to read in (scan) all unique
> elements (following sep=" ") from the text, but unable to past
everything
> together again. Any id on how to solve this? It looks like this now:
>
> text<-scan("test.txt", character(0), sep = " ")
>
>> text
> ?[1] "Most" ? ? ? ? ? ?"fundamentally," ?"it"
? ? ? ? ? ? ?"has"
> ?[5] "led" ? ? ? ? ? ? "to" ? ? ? ? ? ? ?"an"
? ? ? ? ? ? ?"effort"
> ?[9] "to" ? ? ? ? ? ? ?"clarify" ? ? ? ?
"the" ? ? ? ? ? ? "organizational"
> [13] "form" ? ? ? ? ? ?"concept." ? ? ?
?"According" ? ? ? "to"
> [17] "them" ? ? ? ? ? ?"[see" ? ? ? ? ?
?"also" ? ? ? ? ? ?"Smith,"
> [21] "Jones" ? ? ? ? ? "and" ? ? ? ? ? ?
"Carroll" ? ? ? ? "2002],"
> [25] "categories" ? ? ?"emerge" ? ? ? ? ?"as"
? ? ? ? ? ? ?"audience"
> [29] "members" ? ? ? ? "recognize" ? ? ?
"dissimilarities" "among"
> [33] "groups" ? ? ? ? ?"of" ? ? ? ? ? ?
?"consumers" ? ? ? "and"
> [37] "label" ? ? ? ? ? "them" ? ? ? ? ? ?"as"
? ? ? ? ? ? ?"members"
> [41] "of" ? ? ? ? ? ? ?"a" ? ? ? ? ? ? ?
"common" ? ? ? ? ?"set"
> [45] "[Nicol" ? ? ? ? ?"2000]."
>
> --
> View this message in context:
http://r.789695.n4.nabble.com/Select-elements-from-text-tp4323947p4325174.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

mdvaan

2012-Jan-25 16:36 UTC

head link

[R] Select elements from text

Thanks. That worked great!

--
View this message in context:
http://r.789695.n4.nabble.com/Select-elements-from-text-tp4323947p4327711.html
Sent from the R help mailing list archive at Nabble.com.

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Jan 2012 - Select elements from text

[R] Select elements from text

[R] Select elements from text

[R] Select elements from text

[R] Select elements from text

[R] Select elements from text

Seemingly Similar Threads