Displaying 1 result from an estimated 1 matches for "word_ct".
2010 Sep 29
0
Transforming/appending data (words in IMDB)
...r- or under-represented in a particular category (Rating x Genre). I was figuring on estimating this with a g-test, fwiw. But the basic question I'm asking here is about data transformation/appending. To go from these columns:
Film | Genre1 | Genre2 | Genre3 | Reviewer | Rating | Word | Word_ct
to these:
Word | Genre | Rating | Word_ct | Word_ct_in_genre | Word_ct_in_Rating | Expected_word_ct | G-test-score
The actual amount of data is enormous (I have 10 files of ~1.5 GB each) and I suspect I'm going to have to learn how to use the bigmemory package or something like it. But for n...