>From my book on corpus linguistics with R:# (10) Imagine you have two vectors a and b such that a<-c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g") b<-c("a", "g", "d", "f", "g", "a", "f", "a", "b", "g") # Of these vectors, you can create frequency lists by writing freq.list.a<-table(a); freq.list.b<-table(b) rm(a); rm(b) # How do you merge these two frequency lists without merging the two vectors first? More specifically, if I delete a and b from your memory, rm(a); rm(b) # how do you generate the following table only from freq.list.a and freq.list.b, i.e., without any reference to a and b themselves? Before you complain about this question as being unrealistic, consider the possibility that you generated the frequency lists of two corpora (here, a and b) that are so large that you cannot combine them into one (a.and.b<-c(a, b)) and generate a frequency list of that combined vector (table(a.and.b)) ... joint.freqs a b d e f g i j 3 1 3 1 5 5 1 1 joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a), names(freq.list.b)))))) # You generate an empty vector joint.freqs (i) that is as long as there are different types in both a and b (but note that, as requested, this information is not taken from a or b, but from their frequency lists) ... names(joint.freqs)<-sort(unique(c(names(freq.list.a), names(freq.list.b)))) # ... and (ii) whose elements have these different types as names. joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of the new vector joint.freqs that have the same names as the frequencies in the first frequency list are assigned the respective frequencies. joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b # The elements of the new vector joint.freqs that have the same names as the frequencies in the second frequency list are assigned the sum of the values they already have (either the ones from the first frequency list or just zeroes) and the respective frequencies. joint.freqs # look at the result # Another shorter and more elegant solution was proposed by Claire Crawford (but uses a function which will only be introduced later in the book) freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two frequency lists are merged into a single vector ... joint.freqs<-as.table(tapply(freq.list.a.b, names(freq.list.a.b), sum)) # ... and then the sums of all numbers that share the same names are computed joint.freqs # look at the result # The shortest, but certainly not memory-efficient way to do this involves just using the frequency lists to create one big vector with all elements and tabulate that. table(c(rep(names(freq.list.a), freq.list.a), rep(names(freq.list.b), freq.list.b))) # kind of cheating but possible with short vectors ... HTH, STG -- Stefan Th. Gries ----------------------------------------------- University of California, Santa Barbara http://www.linguistics.ucsb.edu/faculty/stgries
Maybe it is longer, but it's also more general, it issues an error if the tables are not 1-dim. That's where most of the function's extra lines are. Otherwise it's the same as your first solution. The second one has the problem you've mentioned. Rui Barradas Em 20-09-2012 16:46, Stefan Th. Gries escreveu:> Ye, but this is way longer than any of the three solutions I sent, is it not? > STG > -- > Stefan Th. Gries > ----------------------------------------------- > University of California, Santa Barbara > http://www.linguistics.ucsb.edu/faculty/stgries > ----------------------------------------------- > > > On Thu, Sep 20, 2012 at 8:43 AM, Rui Barradas <ruipbarradas at sapo.pt> wrote: >> Hello, >> >> The trick is to use the table's dimnames attributes. Try the following. >> >> addTables <- function(t1, t2){ >> dn1 <- dimnames(t1) >> dn2 <- dimnames(t2) >> if(length(dn1) == 1){ >> dn1 <- unlist(dn1) >> dn2 <- unlist(dn2) >> dns <- sort(unique(c(dn1, dn2))) >> tsum <- array(integer(length(dns)), dim = length(dns)) >> dimnames(tsum) <- list(dns) >> tsum[dn1] <- t1 >> tsum[dn2] <- tsum[dn2] + t2 >> }else >> stop(paste("table with", ndim, "dimensions is not implemented.")) >> tsum >> } >> >> >> a <- c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g") >> b <- c("a", "g", "d", "f", "g", "a", "f", "a", "b", "g") >> ta <- table(a) >> tb <- table(b) >> rm(a, b) >> >> addTables(ta, tb) >> >> Hope this helps, >> >> Rui Barradas >> Em 20-09-2012 15:57, Stefan Th. Gries escreveu: >>> >From my book on corpus linguistics with R: >>> >>> # (10) Imagine you have two vectors a and b such that >>> a<-c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g") >>> b<-c("a", "g", "d", "f", "g", "a", "f", "a", "b", "g") >>> >>> # Of these vectors, you can create frequency lists by writing >>> freq.list.a<-table(a); freq.list.b<-table(b) >>> rm(a); rm(b) >>> >>> # How do you merge these two frequency lists without merging the two >>> vectors first? More specifically, if I delete a and b from your >>> memory, >>> rm(a); rm(b) >>> # how do you generate the following table only from freq.list.a and >>> freq.list.b, i.e., without any reference to a and b themselves? Before >>> you complain about this question as being unrealistic, consider the >>> possibility that you generated the frequency lists of two corpora >>> (here, a and b) that are so large that you cannot combine them into >>> one (a.and.b<-c(a, b)) and generate a frequency list of that combined >>> vector (table(a.and.b)) ... >>> joint.freqs >>> a b d e f g i j >>> 3 1 3 1 5 5 1 1 >>> >>> joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a), >>> names(freq.list.b)))))) # You generate an empty vector joint.freqs (i) >>> that is as long as there are different types in both a and b (but note >>> that, as requested, this information is not taken from a or b, but >>> from their frequency lists) ... >>> names(joint.freqs)<-sort(unique(c(names(freq.list.a), >>> names(freq.list.b)))) # ... and (ii) whose elements have these >>> different types as names. >>> joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of the new >>> vector joint.freqs that have the same names as the frequencies in the >>> first frequency list are assigned the respective frequencies. >>> >>> joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b >>> # The elements of the new vector joint.freqs that have the same names >>> as the frequencies in the second frequency list are assigned the sum >>> of the values they already have (either the ones from the first >>> frequency list or just zeroes) and the respective frequencies. >>> joint.freqs # look at the result >>> >>> # Another shorter and more elegant solution was proposed by Claire >>> Crawford (but uses a function which will only be introduced later in >>> the book) >>> freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two frequency >>> lists are merged into a single vector ... >>> joint.freqs<-as.table(tapply(freq.list.a.b, names(freq.list.a.b), >>> sum)) # ... and then the sums of all numbers that share the same names >>> are computed >>> joint.freqs # look at the result >>> >>> # The shortest, but certainly not memory-efficient way to do this >>> involves just using the frequency lists to create one big vector with >>> all elements and tabulate that. >>> table(c(rep(names(freq.list.a), freq.list.a), rep(names(freq.list.b), >>> freq.list.b))) # kind of cheating but possible with short vectors ... >>> >>> HTH, >>> STG >>> -- >>> Stefan Th. Gries >>> ----------------------------------------------- >>> University of California, Santa Barbara >>> http://www.linguistics.ucsb.edu/faculty/stgries >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>
On Thu, Sep 20, 2012 at 10:57 AM, Stefan Th. Gries <stgries at gmail.com> wrote:> >From my book on corpus linguistics with R: > > # (10) Imagine you have two vectors a and b such that > a<-c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g") > b<-c("a", "g", "d", "f", "g", "a", "f", "a", "b", "g") > > # Of these vectors, you can create frequency lists by writing > freq.list.a<-table(a); freq.list.b<-table(b) > rm(a); rm(b) > > # How do you merge these two frequency lists without merging the two > vectors first? More specifically, if I delete a and b from your > memory, > rm(a); rm(b) > # how do you generate the following table only from freq.list.a and > freq.list.b, i.e., without any reference to a and b themselves? Before > you complain about this question as being unrealistic, consider the > possibility that you generated the frequency lists of two corpora > (here, a and b) that are so large that you cannot combine them into > one (a.and.b<-c(a, b)) and generate a frequency list of that combined > vector (table(a.and.b)) ... > joint.freqs > a b d e f g i j > 3 1 3 1 5 5 1 1 > > joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a), > names(freq.list.b)))))) # You generate an empty vector joint.freqs (i) > that is as long as there are different types in both a and b (but note > that, as requested, this information is not taken from a or b, but > from their frequency lists) ... > names(joint.freqs)<-sort(unique(c(names(freq.list.a), > names(freq.list.b)))) # ... and (ii) whose elements have these > different types as names. > joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of the new > vector joint.freqs that have the same names as the frequencies in the > first frequency list are assigned the respective frequencies. > joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b > # The elements of the new vector joint.freqs that have the same names > as the frequencies in the second frequency list are assigned the sum > of the values they already have (either the ones from the first > frequency list or just zeroes) and the respective frequencies. > joint.freqs # look at the result > > # Another shorter and more elegant solution was proposed by Claire > Crawford (but uses a function which will only be introduced later in > the book) > freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two frequency > lists are merged into a single vector ... > joint.freqs<-as.table(tapply(freq.list.a.b, names(freq.list.a.b), > sum)) # ... and then the sums of all numbers that share the same names > are computed > joint.freqs # look at the result > > # The shortest, but certainly not memory-efficient way to do this > involves just using the frequency lists to create one big vector with > all elements and tabulate that. > table(c(rep(names(freq.list.a), freq.list.a), rep(names(freq.list.b), > freq.list.b))) # kind of cheating but possible with short vectors ... >Try: rowsum(freq.list.a.b, names(freq.list.a.b)) -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com
HI Stefan, Thanks for the solutions. Just to add 1 more: f.a<-table(a); f.b<-table(b) c(f.a[!names(f.a)%in%names(f.b)],f.b[!names(f.b)%in%names(f.a)],xtabs(f.a[names(f.a)%in%names(f.b)]+f.b[names(f.b)%in%names(f.a)]~ names(f.a[names(f.a)%in%names(f.b)]))) #e i j a b d f g #1 1 1 3 1 3 5 5 A.K. ----- Original Message ----- From: Stefan Th. Gries <stgries at gmail.com> To: mcelis at lightminersystems.com Cc: r-help at r-project.org Sent: Thursday, September 20, 2012 10:57 AM Subject: [R] (no subject)>From my book on corpus linguistics with R:# (10)? Imagine you have two vectors a and b such that a<-c("d", "d", "j", "f", "e", "g", "f", "f", "i", "g") b<-c("a", "g", "d", "f", "g", "a", "f", "a", "b", "g") # Of these vectors, you can create frequency lists by writing freq.list.a<-table(a); freq.list.b<-table(b) rm(a); rm(b) # How do you merge these two frequency lists without merging the two vectors first? More specifically, if I delete a and b from your memory, rm(a); rm(b) # how do you generate the following table only from freq.list.a and freq.list.b, i.e., without any reference to a and b themselves? Before you complain about this question as being unrealistic, consider the possibility that you generated the frequency lists of two corpora (here, a and b) that are so large that you cannot combine them into one (a.and.b<-c(a, b)) and generate a frequency list of that combined vector (table(a.and.b)) ... joint.freqs a b d e f g i j 3 1 3 1 5 5 1 1 joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a), names(freq.list.b)))))) # You generate an empty vector joint.freqs (i) that is as long as there are different types in both a and b (but note that, as requested, this information is not taken from a or b, but from their frequency lists) ... names(joint.freqs)<-sort(unique(c(names(freq.list.a), names(freq.list.b)))) # ... and (ii) whose elements have these different types as names. joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of the new vector joint.freqs that have the same names as the frequencies in the first frequency list are assigned the respective frequencies. joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b # The elements of the new vector joint.freqs that have the same names as the frequencies in the second frequency list are assigned the sum of the values they already have (either the ones from the first frequency list or just zeroes) and the respective frequencies. joint.freqs # look at the result # Another shorter and more elegant solution was proposed by Claire Crawford (but uses a function which will only be introduced later in the book) freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two frequency lists are merged into a single vector ... joint.freqs<-as.table(tapply(freq.list.a.b, names(freq.list.a.b), sum)) # ... and then the sums of all numbers that share the same names are computed joint.freqs # look at the result # The shortest, but certainly not memory-efficient way to do this involves just using the frequency lists to create one big vector with all elements and tabulate that. table(c(rep(names(freq.list.a), freq.list.a), rep(names(freq.list.b), freq.list.b))) # kind of cheating but possible with short vectors ... HTH, STG -- Stefan Th. Gries ----------------------------------------------- University of California, Santa Barbara http://www.linguistics.ucsb.edu/faculty/stgries ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.