Yes, I am a newbie. I have a data.frame (MyTable) of 1445846 rows and 15 columns with character data. And a character vector (MyVector) of 473491 elements. I want simply to get a data.frame with the count of how many times each element of MyVector appears in MyTable. I've tried a loop with : for (i in 1 : length (myvector)) sum (MyTable== i) but it crashes my computer. I've also tried something like x <- 1 : length (MyVector) apply (MyTable , 1 , function(x) {sum (MyTable ==x)} but doesn't work. Any idea? Thank you. AAAAAny suggestion is super welcome. Marianna -- View this message in context: http://r.789695.n4.nabble.com/loop-for-a-large-database-tp4422052p4422052.html Sent from the R help mailing list archive at Nabble.com.
On Feb 26, 2012, at 7:13 AM, mari681 wrote:> Yes, I am a newbie. > > I have a data.frame (MyTable) of 1445846 rows and 15 columns with > character data. > And a character vector (MyVector) of 473491 elements. > > I want simply to get a data.frame with the count of how many times > each > element of MyVector appears in MyTable. > > I've tried a loop with : for (i in 1 : length (myvector)) sum > (MyTable== i)In that instance "i" is a number and probably would not be matching something that was a character vector.> > but it crashes my computer. > > I've also tried something like > > x <- 1 : length (MyVector) > apply (MyTable , 1 , function(x) {sum (MyTable ==x)} > > but doesn't work. > Any idea? > > Thank you. AAAAAny suggestion is super welcome.Since you never offered the requested information about your objects, this is guesswork. If MyVector is one of the 15 columns in MyTable then this will have good chance: table(MyTable$MyVector) If on the other hand they are separate and you want to ignore the elements not in MyVector, then assign the value of a table operation and then use match() to pick out the tabulated values In the future, please al least offer the results of str(MyTable). -- David Winsemius, MD Heritage Laboratories West Hartford, CT
Untested die to no data, but this should work with a loop out=vector("list", length= length(MyVector)) for(i in 1 : length (MyVector)) { x <- data.frame (sum (MyTable ==MyVector[i])) out[[i]] <- x } sum(do.call(rbind, out)) -- View this message in context: http://r.789695.n4.nabble.com/loop-for-a-large-database-tp4422052p4422584.html Sent from the R help mailing list archive at Nabble.com.
On Sun, Feb 26, 2012 at 04:13:49AM -0800, mari681 wrote:> Yes, I am a newbie. > > I have a data.frame (MyTable) of 1445846 rows and 15 columns with > character data. > And a character vector (MyVector) of 473491 elements. > > I want simply to get a data.frame with the count of how many times each > element of MyVector appears in MyTable. > > I've tried a loop with : for (i in 1 : length (myvector)) sum (MyTable== i) > > but it crashes my computer.Hi. As David pointed out, you probably want to compute sum (MyTable== myvector[i]) and not sum (MyTable== i). Also, i would expect storing the results somewhere, for example numOccur <- rep(NA, times=length(myvector)) for (i in 1:length(myvector)) numOccur[i] <- sum(MyTable == myvector[i]) What do you see on the crashing computer? I would expect it to run for a long time, but not crashing. Try to run your code on a smaller part of the data to test efficiency of different approaches. How many different strings are in your data? If there is a lot of repeated strings, then it may be better to first compute the frequency table of them and search the strings from "myvector" in this table and sum the frequencies. Does your data frame consist of character vectors or from factors? This may be seen by testing class(MyTable[[1]]). Petr Savicky.
SORRY! The data in MyTable are tagsets of photos, like this: V1 V2 V3 V4 V5 V6 V7 V8 230 green nailpolish barrym 0 0 0 0 0 231 ny green brooklyn cleanup clean gowanus volunteer gcc 232 green saul lecture 0 0 0 0 0 233 green colors cores market colores marakesh mercado malu 234 ny green brooklyn cleanup clean gowanus volunteer gcc 235 green saul lecture 0 0 0 0 0 236 portrait pet white green cat canon square eos V9 V10 V11 V12 V13 V14 V15 230 0 0 0 0 0 0 0 231 gowanuscanalconservancy 0 0 0 0 0 0 232 0 0 0 0 0 0 0 233 malugreen maroc souk marrocos 0 0 0 234 gowanuscanalconservancy 0 0 0 0 0 0 235 0 0 0 0 0 0 0 236 is eyes mark taiwan ii mk2 5d while data of MyVector is a list of tags (none of the columns in particular) whose frequency in MyTable has to be computed. Like this: [1] "life" "wood" "pink" "house" "green" "fall" Thanks!! Marianna -- View this message in context: http://r.789695.n4.nabble.com/loop-for-a-large-database-tp4422052p4422776.html Sent from the R help mailing list archive at Nabble.com.
On Sun, Feb 26, 2012 at 04:13:49AM -0800, mari681 wrote:> Yes, I am a newbie. > > I have a data.frame (MyTable) of 1445846 rows and 15 columns with > character data. > And a character vector (MyVector) of 473491 elements. > > I want simply to get a data.frame with the count of how many times each > element of MyVector appears in MyTable. > > I've tried a loop with : for (i in 1 : length (myvector)) sum (MyTable== i) > > but it crashes my computer.Hi. Try first the following. out <- unclass(table(factor(MyTable[[1]], levels=myvector))) The output should be a table of frequencies of the components of "myvector" in the first column of "MyTable". If this works for the data of the size, which you have, then there are different possible ways how to compute the frequencies in all columns. For example, concatenate all columns to a single vector and apply the above to this concatenation as follows. x <- c(as.matrix(MyTable)) out <- unclass(table(factor(x, levels=myvector))) Here, "out" is a vector of the same length as "myvector" and out[i] is the frequency of myvector[i] in "MyTable". Hope this helps. Petr Savicky.
Hi> > SORRY! > > The data in MyTable are tagsets of photos, like this: > > V1 V2 V3 V4 V5 V6 V7 V8 > 230 green nailpolish barrym 0 0 0 0 0 > 231 ny green brooklyn cleanup clean gowanus volunteer gcc > 232 green saul lecture 0 0 0 0 0 > 233 green colors cores market colores marakesh mercado malu > 234 ny green brooklyn cleanup clean gowanus volunteer gcc > 235 green saul lecture 0 0 0 0 0 > 236 portrait pet white green cat canon square eos > > V9 V10 V11 V12 V13 V14 V15 > 230 0 0 0 0 0 0 0 > 231 gowanuscanalconservancy 0 0 0 0 0 0 > 232 0 0 0 0 0 0 0 > 233 malugreen maroc souk marrocos 0 0 0 > 234 gowanuscanalconservancy 0 0 0 0 0 0 > 235 0 0 0 0 0 0 0 > 236 is eyes mark taiwan ii mk2 5d > > > while data of MyVector is a list of tags (none of the columns inparticular)> whose frequency in MyTable has to be computed. Like this: > > [1] "life" "wood" "pink" "house" "green" "fall"What about changing your data frame to matrix and use table set.seed(111) x<-sample(letters, 200, replace=T) y<-letters[3:6] dim(x)<-c(20,10) dd<-data.frame(x) tt<-table(as.matrix(dd)) tt[names(tt) %in% y] c d e f 13 5 8 3 Regards Petr> > > > Thanks!! > > Marianna > > > -- > View this message in context: http://r.789695.n4.nabble.com/loop-for-a- > large-database-tp4422052p4422776.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code.