ntyson at clovermail.net
2011-Jun-09 19:48 UTC
[R] Coercing Output from mget() into Proper Data Frame
Hello R-philes: I have the following function that gets the output of mget() and converts it to a data frame to return. What I am finding is that the dimensions are wrong. Basically, I get: bridesmaid wed u see m gt lt like love X.0 dress pagetrack one go X3 get 1 56 35 27 30 24 20 20 23 28 17 25 16 16 28 15 26 Instead, I want something like: [1] bridesmaid 56 In other words, I want the word in the first column and the frequency in the second column. Any help would be very much appreciated. Regards, Na'im library(Rstem) # make a data frame of stems and their frequencies stem_freq_list <- function(freqFile) { stem_dict <- new.env(parent=emptyenv(), hash=TRUE) freq_dist <- read.csv(freqFile,header=TRUE) words <- as.character(freq_dist[,1]) freqs <- as.numeric(freq_dist[,2]) stems <- wordStem(words, language="english") uniq_stems <- c() # make a hash table of stems and their frequencies for (i in 1:length(words)) { word <- words[i]; stem <- stems[i]; freq <- freqs[i] if (exists(stem, envir=stem_dict)) { cnt <- get(stem, envir=stem_dict) cnt <- cnt + freqs[i] assign(stem,cnt,envir=stem_dict) } else { assign(stem, freq, envir=stem_dict) uniq_stems <- append(uniq_stems, stem) } } # return data frame of stems and their frequencies stem_freqs_list <- mget(uniq_stems,stem_dict) stem_freqs <- do.call(rbind,stem_freqs_list) return(stem_freqs_list) }
ntyson at clovermail.net
2011-Jun-09 19:48 UTC
[R] Coercing Output from mget() into Proper Data Frame
Hello R-philes: I have the following function that gets the output of mget() and converts it to a data frame to return. What I am finding is that the dimensions are wrong. Basically, I get: bridesmaid wed u see m gt lt like love X.0 dress pagetrack one go X3 get 1 56 35 27 30 24 20 20 23 28 17 25 16 16 28 15 26 Instead, I want something like: [1] bridesmaid 56 In other words, I want the word in the first column and the frequency in the second column. Any help would be very much appreciated. Regards, Na'im library(Rstem) # make a data frame of stems and their frequencies stem_freq_list <- function(freqFile) { stem_dict <- new.env(parent=emptyenv(), hash=TRUE) freq_dist <- read.csv(freqFile,header=TRUE) words <- as.character(freq_dist[,1]) freqs <- as.numeric(freq_dist[,2]) stems <- wordStem(words, language="english") uniq_stems <- c() # make a hash table of stems and their frequencies for (i in 1:length(words)) { word <- words[i]; stem <- stems[i]; freq <- freqs[i] if (exists(stem, envir=stem_dict)) { cnt <- get(stem, envir=stem_dict) cnt <- cnt + freqs[i] assign(stem,cnt,envir=stem_dict) } else { assign(stem, freq, envir=stem_dict) uniq_stems <- append(uniq_stems, stem) } } # return data frame of stems and their frequencies stem_freqs_list <- mget(uniq_stems,stem_dict) stem_freqs <- do.call(rbind,stem_freqs_list) return(stem_freqs_list) }
That's not enough information for us to be able to help you, since there's no reproducible code. Here's the crucial bit:> stem_freqs_list <- mget(uniq_stems,stem_dict) > stem_freqs <- do.call(rbind,stem_freqs_list)What does stem_freqs_list look like? What does stem_freqs look like? dim() and str() would both be helpful here. Sarah On Thu, Jun 9, 2011 at 3:48 PM, <ntyson at clovermail.net> wrote:> Hello R-philes: > > I have the following function that gets the output of mget() and converts it > to a data frame to return. ?What I am finding is that the dimensions are > wrong. ?Basically, I get: > > ?bridesmaid wed ?u see ?m gt lt like love X.0 dress pagetrack one go X3 get > 1 ? ? ? ? 56 ?35 27 ?30 24 20 20 ? 23 ? 28 ?17 ? ?25 ? ? ? ?16 ?16 28 15 ?26 > > Instead, I want something like: > > [1] bridesmaid 56 > > In other words, I want the word in the first column and the frequency in the > second column. > > Any help would be very much appreciated. > > Regards, > > Na'im > > library(Rstem) > > # make a data frame of stems and their frequencies > stem_freq_list <- function(freqFile) { > ? ?stem_dict <- new.env(parent=emptyenv(), hash=TRUE) > ? ?freq_dist <- read.csv(freqFile,header=TRUE) > ? ?words <- as.character(freq_dist[,1]) > ? ?freqs <- as.numeric(freq_dist[,2]) > ? ?stems <- wordStem(words, language="english") > ? ?uniq_stems <- c() > > ? ?# make a hash table of stems and their frequencies > ? ?for (i in 1:length(words)) { > ? ? ? ?word <- words[i]; stem <- stems[i]; freq <- freqs[i] > ? ? ? ?if (exists(stem, envir=stem_dict)) { > ? ? ? ? ? ?cnt <- get(stem, envir=stem_dict) > ? ? ? ? ? ?cnt <- cnt + freqs[i] > ? ? ? ? ? ?assign(stem,cnt,envir=stem_dict) > ? ? ? ?} else { > ? ? ? ? ? ?assign(stem, freq, envir=stem_dict) > ? ? ? ? ? ?uniq_stems <- append(uniq_stems, stem) > ? ? ? ?} > ? ?} > > ? ?# return data frame of stems and their frequencies > ? ?stem_freqs_list <- mget(uniq_stems,stem_dict) > ? ?stem_freqs <- do.call(rbind,stem_freqs_list) > ? ?return(stem_freqs_list) > } >-- Sarah Goslee http://www.functionaldiversity.org