thr3ads.net - R help - [R] Essay identification [Jun 2005]

If this information is useful, please help other people find it:
Share via:

Werner Bier

2005-Jun-12 19:29 UTC

[R] Essay identification

Hi R-help,
 
I have a database of 10 students who have written an overall of 78 essays. 
The challenge? I would like to identify who wrote the 79th essay.
 
Has anybody used R in this context? 
 
Even if not, would you suggest me which pattern recognition technique I might
possibly apply?
 
Thanks a lot and regards,
Tom 


		
---------------------------------


	[[alternative HTML version deleted]]

Berton Gunter

2005-Jun-12 21:43 UTC

head link

[R] Essay identification

I assume that you know the usual procedure is to 'score' each essay by a
vector that gives the frequency of occurrence of commonly used (sometimes
adding subject matter specific) words and phrases. This multivariate
response is then fed in as a "training set" into your favorite
supervised
learning/classification procedure. R has many of these -- trees, logisic
regression, boosting, Random Forests,svm's,LDA,SOM's (whoops --
that's an
Unsupervised one),  ... . Try
RSiteSearch('Classification',restrict=('functions').

The devil is in the details as to what works best, I believe. With only 78
exemplars in 10 groups, unless there is a lot of separation (disparate
styles that you could probably detect manually) it may be difficult. It also
depends on how large each group is (balance is generally better).

Cheers,
Bert

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Werner Bier
Sent: Sunday, June 12, 2005 12:30 PM
To: r-help at stat.math.ethz.ch
Subject: [R] Essay identification

Hi R-help,
 
I have a database of 10 students who have written an overall of 78 essays. 
The challenge? I would like to identify who wrote the 79th essay.
 
Has anybody used R in this context? 
 
Even if not, would you suggest me which pattern recognition technique I
might possibly apply?
 
Thanks a lot and regards,
Tom 


		
---------------------------------


	[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Gabor Grothendieck

2005-Jun-12 22:05 UTC

head link

[R] Essay identification

On 6/12/05, Werner Bier <aliscla at yahoo.com>
wrote:> Hi R-help,
> 
> I have a database of 10 students who have written an overall of 78 essays.
> The challenge? I would like to identify who wrote the 79th essay.
> 
> Has anybody used R in this context?
> 
> Even if not, would you suggest me which pattern recognition technique I
might possibly apply?
Check out

http://xxx.uni-augsburg.de/PS_cache/cond-mat/pdf/0108/0108530.pdf

for a simple method.

Greg Snow

2005-Jun-13 16:02 UTC

head link

[R] Essay identification

This topic is sometimes called wordprinting or stylometry.  The spring
2003 issue of Chance magazine had several articles on the topic.

A colleague of mine and I have been working on a perl program (along
with various graduate students) to extract many of the common statistics
used in wordprinting (counts/percentages of non-contextual words, word
pattern ratios, vocabulary richness).  The data can then be loaded into
R (or any other stats package) to be analyzed.

The program is currently in a beta state (usable, but we want to
possibly add more features and documentation), but I can send a copy to
anyone who is interested (specify if you have perl, or need a stand
alone copy (windows only)).

hope this helps,

Greg Snow, Ph.D.
Statistical Data Center, LDS Hospital
Intermountain Health Care
greg.snow at ihc.com
(801) 408-8111
>>> Werner Bier <aliscla at yahoo.com> 06/12/05 01:29PM
>>>Hi R-help,
 
I have a database of 10 students who have written an overall of 78
essays. 
The challenge? I would like to identify who wrote the 79th essay.
 
Has anybody used R in this context? 
 
Even if not, would you suggest me which pattern recognition technique I
might possibly apply?
 
Thanks a lot and regards,
Tom 


		
---------------------------------


	[[alternative HTML version deleted]]

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help 
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Seemingly Similar Threads

Search for more seemingly similar threads

R help - Jun 2005 - Essay identification

[R] Essay identification

[R] Essay identification

[R] Essay identification

[R] Essay identification

Seemingly Similar Threads