thr3ads.net - R help - [R] How do I use R to build a dictionary of proper nouns? [May 2017]

If this information is useful, please help other people find it:
Share via:

θ ＂

2017-May-05 05:58 UTC

[R] How do I use R to build a dictionary of proper nouns?

?? ?? ???c?????? OneDrive ?n?????????z???n???????????????B?Y??


<https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>

2.corpus_patent text.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>

<https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>

3ontology_proper nouns
keywords.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>

<https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>

1.patents.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>




Hi :

I want to do patents text mining in R.
I need to use the proper nouns of domain ontology to build a dictionary.
Then use the dictionary to analysis my corpus of patent files.
I want to calculate the proper nouns and get the word frequency that appears in
each file.

Now I have done the preprocess for the corpus and extract the proper nouns from
domain ontology.
But I have no idea how to build a proper nouns dictionary and use the dictionary
to analysis my corpus.

The Attachments are my texts, corpus preprocesses and proper nouns.

Thanks.

	[[alternative HTML version deleted]]

Boris Steipe

2017-May-05 08:39 UTC

head link

[R] How do I use R to build a dictionary of proper nouns?

Did you try using the table() function, possibly in combination with sort() or
rank()?


Consider:

myNouns <- c("proper", "nouns", "domain",
"ontology", "dictionary",
             "dictionary", "corpus", "patent",
"files", "proper", "nouns",
             "word", "frequency", "file",
"preprocess", "corpus", "proper",
             "nouns", "domain", "ontology",
"idea", "nouns", "dictionary",
             "dictionary", "corpus",
"attachments", "texts", "corpus",
             "preprocesses", "proper", "nouns")

myNounFrequencies <- table(myNouns)
myNounFrequencies

myNounFrequencies <- sort(myNounFrequencies, decreasing = TRUE)
myNounFrequencies

which(names(myNounFrequencies) == "corpus")




> On May 5, 2017, at 1:58 AM, ? ? <yarmi1224 at hotmail.com> wrote:
> 
> ? ? ????? OneDrive ??????????????????
> 
> 
> <https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>
>
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>
> 
> 2.corpus_patent
text.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>
> 
> <https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>
>
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>
> 
> 3ontology_proper nouns
keywords.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>
> 
> <https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>
>
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>
> 
> 1.patents.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>
> 
> 
> 
> 
> Hi :
> 
> I want to do patents text mining in R.
> I need to use the proper nouns of domain ontology to build a dictionary.
> Then use the dictionary to analysis my corpus of patent files.
> I want to calculate the proper nouns and get the word frequency that
appears in each file.
> 
> Now I have done the preprocess for the corpus and extract the proper nouns
from domain ontology.
> But I have no idea how to build a proper nouns dictionary and use the
dictionary to analysis my corpus.
> 
> The Attachments are my texts, corpus preprocesses and proper nouns.
> 
> Thanks.
> 
> 	[[alternative HTML version deleted]]
> 
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

Boris Steipe

2017-May-08 09:09 UTC

head link

[R] How do I use R to build a dictionary of proper nouns?

Your workflow is not clear to me, so I can't give any specific advice.

1: I don't understand what you need. Do you need the column names changed?
They correspond to the matched
   words.

2: How was the vector dictionary_word created? These are (mostly) stemmed nouns,
but some of them are two or even three words? Did you do this by hand? But this
also contains "cmp" which is not a stemmed word, or
"particl", or "recoveri" which is not correctly stemmed.
This doesn't look promising, I think at least you will need to place hyphens
between the words, but since you are using stemmed words this will be difficult.

3: Since the default tokenizer is "words", I think the two-word and
three-word elements of the dictionary_word vector will not be found. They
don't exist as tokens.

4: Don't use "list" as a variable name.

In summary - I think your problems have to do with stemming and tokenizing and
not with formatting the output of DocumentTermMatrix(). I don't think tm has
functions to produce stemmed multi-word tokens like the elements in your
dictionary_word vector. You may need to do the analysis with your own functions,
using regular expressions.


B.

> On May 8, 2017, at 3:56 AM, ? ? <yarmi1224 at hotmail.com> wrote:
> 
> Hi Steipe?
> Thanks for your recommend.
> I have used the DocumentTermMatrix function of tm package to try. 
> But I prefer the matrix result shows the frequency of the dictionary word.
> Is there any way to do?  
> The following are my code and result?
> 
> dictionary_word <- c("neutral", "abras particl",
"acid", "apparatus", "back film",
"basic", "carrier", "chemic", "chromat
confoc", "clean system", "cmp", "compens
type", "compress", "comsum", "control
system", "down pressur", "dresser condition",
"detect system", "flow rate control", "fractal
type", "groov", "hard", "improv type",
"infrar", "laser confoc", "layer", "measur
system", "micro stuctur", "monitor system", "multi
layer", "none-por", "nonwoven pad", "pad",
"pad applic", "pad condit system", "pad materi",
"pad properti", "pad structur", "ph sensor",
"planet type", "plate", "plat", "poisson
ratio", "polish head", "polish system", "polym
pad", "polyurethan pad", "porous", "process
paramet", "process path", "process time",
"recoveri", "rotat speed", "rough",
"scatter", "semiconductor cmp", "sensor",
"signal acceptor", "singl layer", "slurri",
"slurri flow rate", "slurri ph valu", "slurri
stirrer", "slurri suppli system", "slurri temperatur",
"slurri weight percentag", "storag cmp", "stylus
profil", "substrat cmp", "thick", "transfer
robot", "ultrason", "urethan pad", "wafer
cassett", "wafer transfer system", "white light
interferomet", "young modulus")
> 
> list<-inspect(DocumentTermMatrix(corpus_tm,
>                                  list(weighting =weightTf,
>                                       dictionary = dictionary_word)))
> 
> <keywords of dictionary.PNG>
> 
> 
> ???: Boris Steipe <boris.steipe at utoronto.ca>
> ????: 2017?5?5? ?? 04:39
> ???: ? ?
> ??: r-help at r-project.org
> ??: Re: [R] How do I use R to build a dictionary of proper nouns?
>  
> Did you try using the table() function, possibly in combination with sort()
or rank()?
> 
> 
> Consider:
> 
> myNouns <- c("proper", "nouns", "domain",
"ontology", "dictionary",
>              "dictionary", "corpus",
"patent", "files", "proper", "nouns",
>              "word", "frequency", "file",
"preprocess", "corpus", "proper",
>              "nouns", "domain", "ontology",
"idea", "nouns", "dictionary",
>              "dictionary", "corpus",
"attachments", "texts", "corpus",
>              "preprocesses", "proper",
"nouns")
> 
> myNounFrequencies <- table(myNouns)
> myNounFrequencies
> 
> myNounFrequencies <- sort(myNounFrequencies, decreasing = TRUE)
> myNounFrequencies
> 
> which(names(myNounFrequencies) == "corpus")
> 
> 
> 
> 
> 
> > On May 5, 2017, at 1:58 AM, ? ? <yarmi1224 at hotmail.com>
wrote:
> > 
> > ? ? ????? OneDrive ??????????????????
> > 
> > 
> > <https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>
> 
> 
> 
> >
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>
> > 
> > 2.corpus_patent
text.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>
> 
> 
> 
> > 
> > <https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>
> 
> 
> 
> >
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>
> > 
> > 3ontology_proper nouns
keywords.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>
> 
> 
> 
> > 
> > <https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>
> 
> 
> 
> >
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>
> > 
> > 1.patents.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>
> 
> 
> 
> > 
> > 
> > 
> > 
> > Hi :
> > 
> > I want to do patents text mining in R.
> > I need to use the proper nouns of domain ontology to build a
dictionary.
> > Then use the dictionary to analysis my corpus of patent files.
> > I want to calculate the proper nouns and get the word frequency that
appears in each file.
> > 
> > Now I have done the preprocess for the corpus and extract the proper
nouns from domain ontology.
> > But I have no idea how to build a proper nouns dictionary and use the
dictionary to analysis my corpus.
> > 
> > The Attachments are my texts, corpus preprocesses and proper nouns.
> > 
> > Thanks.
> > 
> >        [[alternative HTML version deleted]]
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> R-help Info Page - Homepage - SfS ? Seminar for Statistics
> stat.ethz.ch
> The main R mailing list, for announcements about the development of R and
the availability of new code, questions and answers about problems and solutions
using R ...
> 
> 
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

θ ＂

2017-May-09 05:12 UTC

head link

[R] How do I use R to build a dictionary of proper nouns?

Hi Boris :
I'm very thanks for your reply and your suggestions.
In order to be clear show my workflow, I have added my code and document file in
the attachment.
My research target is to get the topic technique of CMP (chemical mechanical
polishing).
So I want to use related patent texts to do text mining.
Here are my ways for text mining process.
1.Tf-idf
2.CMP ontology
The CMP ontology is made by myself. It's used to build the dictionary and
extract the proper nouns of CMP.
Here is my workflow to  build a dictionary of proper nouns:
1. Read the ontology file into R.
2. Extract proper nouns from the ontology.
3.Use tm package to do preprocessing:
    (remove"_",  tolower,  stripWhitespace, stemDocument)
4. Build a dictionary of proper nouns.

Finally, I want to extract proper noun which shows in my patent documents
(corpus_tm) and its frequency.

Thanks
Eva
________________________________
??????: Boris Steipe <boris.steipe at utoronto.ca>
????????: 2017??5??8?? ???? 05:09
??????: ?? ??
????: r-help at r-project.org
????: Re: [R] How do I use R to build a dictionary of proper nouns?

Your workflow is not clear to me, so I can't give any specific advice.

1: I don't understand what you need. Do you need the column names changed?
They correspond to the matched
   words.

2: How was the vector dictionary_word created? These are (mostly) stemmed nouns,
but some of them are two or even three words? Did you do this by hand? But this
also contains "cmp" which is not a stemmed word, or
"particl", or "recoveri" which is not correctly stemmed.
This doesn't look promising, I think at least you will need to place hyphens
between the words, but since you are using stemmed words this will be difficult.

3: Since the default tokenizer is "words", I think the two-word and
three-word elements of the dictionary_word vector will not be found. They
don't exist as tokens.

4: Don't use "list" as a variable name.

In summary - I think your problems have to do with stemming and tokenizing and
not with formatting the output of DocumentTermMatrix(). I don't think tm has
functions to produce stemmed multi-word tokens like the elements in your
dictionary_word vector. You may need to do the analysis with your own functions,
using regular expressions.


B.

> On May 8, 2017, at 3:56 AM, ?? ?? <yarmi1224 at hotmail.com> wrote:
>
> Hi Steipe??
> Thanks for your recommend.
> I have used the DocumentTermMatrix function of tm package to try.
> But I prefer the matrix result shows the frequency of the dictionary word.
> Is there any way to do?
> The following are my code and result??
>
> dictionary_word <- c("neutral", "abras particl",
"acid", "apparatus", "back film",
"basic", "carrier", "chemic", "chromat
confoc", "clean system", "cmp", "compens
type", "compress", "comsum", "control
system", "down pressur", "dresser condition",
"detect system", "flow rate control", "fractal
type", "groov", "hard", "improv type",
"infrar", "laser confoc", "layer", "measur
system", "micro stuctur", "monitor system", "multi
layer", "none-por", "nonwoven pad", "pad",
"pad applic", "pad condit system", "pad materi",
"pad properti", "pad structur", "ph sensor",
"planet type", "plate", "plat", "poisson
ratio", "polish head", "polish system", "polym
pad", "polyurethan pad", "porous", "process
paramet", "process path", "process time",
"recoveri", "rotat speed", "rough",
"scatter", "semiconductor cmp", "sensor",
"signal acceptor", "singl layer", "slurri",
"slurri flow rate", "slurri ph valu", "slurri
stirrer", "slurri suppli system", "slurri temperatur",
"slurri weight percentag", "storag cmp", "stylus
profil", "substrat cmp", "thick", "transfer
robot", "ultrason", "urethan pad", "wafer
cassett", "wafer transfer system", "white light
interferomet", "young modulus")
>
> list<-inspect(DocumentTermMatrix(corpus_tm,
>                                  list(weighting =weightTf,
>                                       dictionary = dictionary_word)))
>
> <keywords of dictionary.PNG>
>
>
> ??????: Boris Steipe <boris.steipe at utoronto.ca>
> ????????: 2017??5??5?? ???? 04:39
> ??????: ?? ??
> ????: r-help at r-project.org
> ????: Re: [R] How do I use R to build a dictionary of proper nouns?
>
> Did you try using the table() function, possibly in combination with sort()
or rank()?
>
>
> Consider:
>
> myNouns <- c("proper", "nouns", "domain",
"ontology", "dictionary",
>              "dictionary", "corpus",
"patent", "files", "proper", "nouns",
>              "word", "frequency", "file",
"preprocess", "corpus", "proper",
>              "nouns", "domain", "ontology",
"idea", "nouns", "dictionary",
>              "dictionary", "corpus",
"attachments", "texts", "corpus",
>              "preprocesses", "proper",
"nouns")
>
> myNounFrequencies <- table(myNouns)
> myNounFrequencies
>
> myNounFrequencies <- sort(myNounFrequencies, decreasing = TRUE)
> myNounFrequencies
>
> which(names(myNounFrequencies) == "corpus")
>
>
>
>
>
> > On May 5, 2017, at 1:58 AM, ?? ?? <yarmi1224 at hotmail.com>
wrote:
> >
> > ?? ?? ???c?????? OneDrive ?n?????????z???n???????????????B?Y??
> >
> >
> > <https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>[https://uw8msa.bl3301.livefilestore.com/y4mlkNRVEt1UjK8BTCKa_IHAfx3slsjvqzKBBCE5FqvFOsb5JYl3jsKXvku_EcRwWkvS5Y0nl-yiSjNVyo7ApVl6jTE0ThkhbWa0FzfeiSHll2koMwy6iWdYae1AXAREZyH3D8K5xbCf_N2LNyERh50VUYOESXH_RdYjGTMriVVXDY]
<https://1drv.ms/i/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>
[https://uw8msa.bl3301.livefilestore.com/y4mlkNRVEt1UjK8BTCKa_IHAfx3slsjvqzKBBCE5FqvFOsb5JYl3jsKXvku_EcRwWkvS5Y0nl-yiSjNVyo7ApVl6jTE0ThkhbWa0FzfeiSHll2koMwy6iWdYae1AXAREZyH3D8K5xbCf_N2LNyERh50VUYOESXH_RdYjGTMriVVXDY]

>
>
>
> >
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>
> >
> > 2.corpus_patent
text.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>[https://uw8msa.bl3301.livefilestore.com/y4mlkNRVEt1UjK8BTCKa_IHAfx3slsjvqzKBBCE5FqvFOsb5JYl3jsKXvku_EcRwWkvS5Y0nl-yiSjNVyo7ApVl6jTE0ThkhbWa0FzfeiSHll2koMwy6iWdYae1AXAREZyH3D8K5xbCf_N2LNyERh50VUYOESXH_RdYjGTMriVVXDY]
<https://1drv.ms/i/s!Aq27nOPOP5izgVRRxXomVBv0YV0j>
[https://uw8msa.bl3301.livefilestore.com/y4mlkNRVEt1UjK8BTCKa_IHAfx3slsjvqzKBBCE5FqvFOsb5JYl3jsKXvku_EcRwWkvS5Y0nl-yiSjNVyo7ApVl6jTE0ThkhbWa0FzfeiSHll2koMwy6iWdYae1AXAREZyH3D8K5xbCf_N2LNyERh50VUYOESXH_RdYjGTMriVVXDY]

>
>
>
> >
> > <https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>[https://uwpl0w.bl3301.livefilestore.com/y4mY6mahV5KDRXKY0h8S6lbcH1zTGwM8vT6edKP8yUzxwxp874gXxRuao8FANEn_-wY1o7NG5QgVLo9Q9QnfgTtFneHbYx5MxUUtXEK2DqEsKOeAGxu20xxn_wVPqkK8ljOt2Jia7YN2neRhuvx7gQwXM2ttYTaaMUO9FSmo_CORdQ]
<https://1drv.ms/i/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>
[https://uwpl0w.bl3301.livefilestore.com/y4mY6mahV5KDRXKY0h8S6lbcH1zTGwM8vT6edKP8yUzxwxp874gXxRuao8FANEn_-wY1o7NG5QgVLo9Q9QnfgTtFneHbYx5MxUUtXEK2DqEsKOeAGxu20xxn_wVPqkK8ljOt2Jia7YN2neRhuvx7gQwXM2ttYTaaMUO9FSmo_CORdQ]

>
>
>
> >
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>
> >
> > 3ontology_proper nouns
keywords.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>[https://uwpl0w.bl3301.livefilestore.com/y4mY6mahV5KDRXKY0h8S6lbcH1zTGwM8vT6edKP8yUzxwxp874gXxRuao8FANEn_-wY1o7NG5QgVLo9Q9QnfgTtFneHbYx5MxUUtXEK2DqEsKOeAGxu20xxn_wVPqkK8ljOt2Jia7YN2neRhuvx7gQwXM2ttYTaaMUO9FSmo_CORdQ]
<https://1drv.ms/i/s!Aq27nOPOP5izgVURiS7MbYH6hJzo>
[https://uwpl0w.bl3301.livefilestore.com/y4mY6mahV5KDRXKY0h8S6lbcH1zTGwM8vT6edKP8yUzxwxp874gXxRuao8FANEn_-wY1o7NG5QgVLo9Q9QnfgTtFneHbYx5MxUUtXEK2DqEsKOeAGxu20xxn_wVPqkK8ljOt2Jia7YN2neRhuvx7gQwXM2ttYTaaMUO9FSmo_CORdQ]

>
>
>
> >
> > <https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>[https://uwqiaw.bl3301.livefilestore.com/y4ma17MHcDeMhjshTwq328eWx11Xz6DlCWOvNyOCfggv8TJXVc-KNC81Vx8N4sN6M_XgRcMUzWpcpIg1HcR2bg4-LcyI4VZU0hmVUZBKTXWzcFhcIgV17FMO5_XyS0sLJH2dP1gXk7-pqsmKEhpwYN6Re102YbzG5chhvBaMlD7kHA]
<https://1drv.ms/i/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>
[https://uwqiaw.bl3301.livefilestore.com/y4ma17MHcDeMhjshTwq328eWx11Xz6DlCWOvNyOCfggv8TJXVc-KNC81Vx8N4sN6M_XgRcMUzWpcpIg1HcR2bg4-LcyI4VZU0hmVUZBKTXWzcFhcIgV17FMO5_XyS0sLJH2dP1gXk7-pqsmKEhpwYN6Re102YbzG5chhvBaMlD7kHA]

>
>
>
> >
[https://r1.res.office365.com/owa/prem/images/dc-png_20.png]<https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>
> >
> > 1.patents.PNG<https://1drv.ms/u/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>[https://uwqiaw.bl3301.livefilestore.com/y4ma17MHcDeMhjshTwq328eWx11Xz6DlCWOvNyOCfggv8TJXVc-KNC81Vx8N4sN6M_XgRcMUzWpcpIg1HcR2bg4-LcyI4VZU0hmVUZBKTXWzcFhcIgV17FMO5_XyS0sLJH2dP1gXk7-pqsmKEhpwYN6Re102YbzG5chhvBaMlD7kHA]
<https://1drv.ms/i/s!Aq27nOPOP5izgVYuRVxM1OyzIPzF>
[https://uwqiaw.bl3301.livefilestore.com/y4ma17MHcDeMhjshTwq328eWx11Xz6DlCWOvNyOCfggv8TJXVc-KNC81Vx8N4sN6M_XgRcMUzWpcpIg1HcR2bg4-LcyI4VZU0hmVUZBKTXWzcFhcIgV17FMO5_XyS0sLJH2dP1gXk7-pqsmKEhpwYN6Re102YbzG5chhvBaMlD7kHA]

>
>
>
> >
> >
> >
> >
> > Hi :
> >
> > I want to do patents text mining in R.
> > I need to use the proper nouns of domain ontology to build a
dictionary.
> > Then use the dictionary to analysis my corpus of patent files.
> > I want to calculate the proper nouns and get the word frequency that
appears in each file.
> >
> > Now I have done the preprocess for the corpus and extract the proper
nouns from domain ontology.
> > But I have no idea how to build a proper nouns dictionary and use the
dictionary to analysis my corpus.
> >
> > The Attachments are my texts, corpus preprocesses and proper nouns.
> >
> > Thanks.
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-helpR-help Info Page - Homepage - SfS ?C Seminar for
Statistics<https://stat.ethz.ch/mailman/listinfo/r-help>
stat.ethz.ch
The main R mailing list, for announcements about the development of R and the
availability of new code, questions and answers about problems and solutions
using R ...


> R-help Info Page - Homepage - SfS ?C Seminar for Statistics
> stat.ethz.ch
> The main R mailing list, for announcements about the development of R and
the availability of new code, questions and answers about problems and solutions
using R ...
>
>
> > PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.

R help - May 2017 - How do I use R to build a dictionary of proper nouns?

[R] How do I use R to build a dictionary of proper nouns?

[R] How do I use R to build a dictionary of proper nouns?

[R] How do I use R to build a dictionary of proper nouns?

[R] How do I use R to build a dictionary of proper nouns?