What is wrong with lsa? You could contact author of that package if it is close
and discuss better options if no one here can help. your latest def just
reiterate "summarization" but that you want complrtr sentences. if you
can site specific algor or impl and go to a sig list may be more help
Sent from my Verizon Wireless BlackBerry
-----Original Message-----
From: Ravishankar Rajagopalan <vioravis@gmail.com>
Date: Thu, 2 Jun 2011 16:29:59
To: Mike Marchywka<marchywka@hotmail.com>
Cc: <r-help@r-project.org>
Subject: Re: [R] Text Summarization
Hi Mike,
Thanks for your inputs.
Well, this is not really a research topic :) There are software already
available to do text summarization. I was just wondering whether we could do
it in R itself.
The document you sent is what I exactly referred to before I sent my first
email.
Below Table 1 in that paper, they have defined Summarization as
"summarization
of important concepts in a text. Typically these are highfrequency terms".
As I mentioned in my first post, I don't see summarization as just a bag of
high frequency words. It is usually summary containing meaningful sentences.
Also, there is no reference to summarization after that in the document. It
doesn't refer to any R package/command to do what I am looking for :(
Thanks again for your help.
Regards,
Ravishankar Rajagopalan
On Wed, Jun 1, 2011 at 4:49 PM, Mike Marchywka
<marchywka@hotmail.com>wrote:
>
>
>
>
>
>
>
> ________________________________
> > Date: Wed, 1 Jun 2011 10:01:14 +0530
> > Subject: Re: [R] Text Summarization
> > From: vioravis@gmail.com
> > To: marchywka@hotmail.com
> > CC: r-help@r-project.org
> >
> > Mike,
> >
> > This is what I am looking for.
> >
> > http://en.wikipedia.org/wiki/Automatic_summarization
> >
> > I want to obtain a summary of a huge document as meaningful sentences.
> > I do not want a bag of words as the output. I have 1000's of
documents
> > each one running to 3-4 pages. I plan to use R to do
> > clustering/classification of these documents. Instead of working with
> > the original document, I think it would be better to work with a
> > summary of the documents since this would avoid memory issues.
> >
>
>
> Well, it seems to be a bit of a research topic so I presume you are
> looking for starting points rather than specific final solution.
> I did manage to do a google search ( for which I feel quite
> accopmlished as getting a browser to work any more is quite a chore,
> firefox on hotmail is very slow and IE seems to not like
> to download googlecode pdf LOL),
>
>
>
http://www.google.com/search?hl=en&q=%2B%22CRAN%22+%22computational+linguistics%22+%22document+summarization%22
>
>
>
> The first hit I get is something called "Text Mining Infrastrcture in
R"(
> to
> which I can not easily post a link since the link in goog hit is redirect
> through
> goog and browser downloads and opens temp file...)
>
> For clustering or classification you may not care too much about semantics,
> word frequncies may work etc. I guess this term could be use too,
>
> http://en.wikipedia.org/wiki/Latent_semantic_analysis
>
>
> Fridolin Wild (November 23, 2005). "An Open Source LSA Package for
R".
> CRAN. Retrieved 2006-11-20.
>
>
>
>
>
>
>
>
> > Thank you.
> >
> > Ravi
> >
> >
> >
> > On Tue, May 31, 2011 at 10:02 PM, Mike Marchywka
> > > wrote:
> >
> >
> >
> >
> >
> >
> >
> > ----------------------------------------
> > > Date: Tue, 31 May 2011 03:25:56 -0700
> > > From: vioravis@gmail.com
> > > To: r-help@r-project.org
> > > Subject: [R] Text Summarization
> > >
> > > Is there a text mining/ NLP package in R that could do text
> summarization?
> > > For example, take a huge text as input and provide a summary of
the
> text.
> > >
> > > In package tm, summarization is defined more as high frequency
terms
> which
> > > is not what I want. I actually want a summary of what is present
in
> > the huge
> > > volume of text.
> > >
> > Cliff's notes? Can you define it more precisely? There are some
> computational
> > linguistics packages IIRC.
> >
> >
> > > Any help on a R package would be helpful. Thank you.
> > >
> > > Ravi
> > >
> > > --
> > > View this message in context:
> >
http://r.789695.n4.nabble.com/Text-Summarization-tp3562735p3562735.html
> > > Sent from the R help mailing list archive at Nabble.com.
> > >
> > > ______________________________________________
> > > R-help@r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> >
http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> > > and provide commented, minimal, self-contained, reproducible
code.
> >
>
[[alternative HTML version deleted]]