Hi, I am currently find myself selecting manually amoungts several hundreds Google Alerts (GA) texts those that are indeed relevant for my research vs those which are not (despite they are triggered by some relevant seach keywords). Basically each week I get several hundreds GA email such as: https://www.dropbox.com/s/u7rp0ez1tamq001/Alerte%20Google%C2%A0-%20laitier%20-%20lucam1968%40gmail.com%20-%20Gmail.pdf?dl=0 and https://www.dropbox.com/s/1ubx5enw6tc90hj/Google%20Alert%20-%20latte%20-%20lucam1968%40gmail.com%20-%20Gmail.pdf?dl=0>From such emails I create a file such as:https://www.dropbox.com/s/y5yqcsxp1zcmnhc/test_sample.xlsx?dl=0 And this is really becoming a time consuming procedure, hence my decision to try appling artificial intelligence solutions to such a case. What I would really need are 2 separate steps: (1) A procedure that reads the GA email and creates a file such as the excel I have shared here (only first 3 columns) (2) Some sort of supervised learning algorithm that can learn by example from my choices and decide on my behalf (see column 4 in the attached file). That is: taking the output from step (1) above I can classify a few hundreds cases and then let the algorithm learn and classify future/additional data. I plan to regularly review such a classification, correct missclassifications and train the algorithm again with the objective to improve its ability to correctly classify the GA texts. Is my explanation clear enought? Can all the above be done within R? If so, is there any package/procedure I should be using? Thank you in advance for any suggestion you might have. Luca [[alternative HTML version deleted]]
Luca: 1. We are not a consulting service. We *help* with R pogramming issues. Users are typically expected to make an effort by providing R code and, if appropriate, small data sets that illustrate their difficulties. 2. SEARCH! e.g. on "text processing R" or some such; or try Rseek.org with such searches. R has extensive text processing capabilities, e.g. via regex's. 3. "Supervised Learning algorithm" is far too vague to be useful. 4. See this CRAN task view: https://cran.r-project.org/web/views/MachineLearning.html 4. The answer to your query is almost certainly yes, but you may have to do some reading to clarify your thinking. As this involves primarily statistical issues, you may wish to post on a statistical site like http://stats.stackexchange.com/ to get advice. R-help site helps with R programming primarily, not statistical methodology (although they do sometimes intersect). Cheers, Bert [[alternative HTML version deleted]]
Hi Bert, Thank you for your useful suggestions I will follow them and come back to this list with any specific R code issue I might have. Kind regards, Luca 2017-10-02 16:57 GMT+02:00 Bert Gunter <bgunter.4567 at gmail.com>:> Luca: > > 1. We are not a consulting service. We *help* with R pogramming issues. > Users are typically expected to make an effort by providing R code and, if > appropriate, small data sets that illustrate their difficulties. > > 2. SEARCH! e.g. on "text processing R" or some such; or try Rseek.org with > such searches. R has extensive text processing capabilities, e.g. via > regex's. > > 3. "Supervised Learning algorithm" is far too vague to be useful. > > 4. See this CRAN task view: > https://cran.r-project.org/web/views/MachineLearning.html > > 4. The answer to your query is almost certainly yes, but you may have to > do some reading to clarify your thinking. As this involves primarily > statistical issues, you may wish to post on a statistical site like > http://stats.stackexchange.com/ to get advice. R-help site helps with R > programming primarily, not statistical methodology (although they do > sometimes intersect). > > Cheers, > Bert > > >[[alternative HTML version deleted]]