thr3ads.net - R help - [R] Burt table from word frequency list [Mar 2009]

If this information is useful, please help other people find it:
Share via:

Joan-Josep Vallbé

2009-Mar-29 11:02 UTC

[R] Burt table from word frequency list

Dear all,

I have a word frequency list from a corpus (say, in .csv), where the  
first column is a word and the second is the occurrence frequency of  
that word in the corpus. Is it possible to obtain a Burt table (a  
table crossing all words with each other, i.e., where rows and columns  
are the words) from that frequency list with R? I'm exploring the
"ca"
package but I'm not able to solve this detail.

Thank you very much!

Joan-Josep Vallb?

Duncan Murdoch

2009-Mar-29 12:00 UTC

head link

[R] Burt table from word frequency list

On 29/03/2009 7:02 AM, Joan-Josep Vallb? wrote:> Dear all,
> 
> I have a word frequency list from a corpus (say, in .csv), where the  
> first column is a word and the second is the occurrence frequency of  
> that word in the corpus. Is it possible to obtain a Burt table (a  
> table crossing all words with each other, i.e., where rows and columns  
> are the words) from that frequency list with R? I'm exploring the
"ca"
> package but I'm not able to solve this detail.
No, because you don't have any information on that.  You only have 
marginal counts.  You need counts of pairs of words (from the original 
corpus, or already summarized.)

Duncan Murdoch

Alan Zaslavsky

2009-Mar-30 12:05 UTC

head link

[R] Burt table from word frequency list

Maybe not terribly hard, depending on exactly what you need.  Suppose you 
turn your text into a character vector 'mytext' of words.  Then for a 
table of words appearing delta words apart (ordered), you can table mytext 
against itself with a lag:

nwords=length(mytext)
burttab=table(mytext[-(1:delta)],mytext[nwords+1-(1:delta)])

Add to its transpose and sum over delta up to your maximum distance apart. 
If you want only words appearing near each other within the same sentence 
(or some other unit), pad out the sentence break with at least delta 
instances of a dummy spacer:

     the cat chased the greedy rat SPACER SPACER SPACER the dog chased the
     clever cat

This will count all pairings at distance delta; if you want to count only 
those for which this was the NEAREST co-occurence (so

     the cat and the rate chased the dog

would count as two at delta=3 but not one at delta=6) it will be trickier 
and I'm not sure this approach can be modified to handle it.
> Date: Sun, 29 Mar 2009 22:20:15 -0400
> From: "Murray Cooper" <myrmail at earthlink.net>
> Subject: Re: [R] Burt table from word frequency list
> 
> The usual approach is to count the co-occurence within so many words of
> each other.  Typical is between 5 words before and 5 words after a
> given word.  So for each word in the document, you look for the
> occurence of all other words within -5 -4 -3 -2 -1 0 1 2 3 4 5 words.
> Depending on the language and the question being asked certain words
> may be excluded.
> 
> This is not a simple function! I don't know if anyone has done a
> package, for this type of analysis but with over 2000 packages floating
> around you might get lucky.

Reasonably Related Threads

Search for more possibly parallel threads

R help - Mar 2009 - Burt table from word frequency list

[R] Burt table from word frequency list

[R] Burt table from word frequency list

[R] Burt table from word frequency list

Reasonably Related Threads