thr3ads.net - R help - [R] (no subject) [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Stefan Th. Gries

2012-Sep-20 14:57 UTC

[R] (no subject)

>From my book on corpus linguistics with R:
# (10)   Imagine you have two vectors a and b such that
a<-c("d", "d", "j", "f",
"e", "g", "f", "f", "i",
"g")
b<-c("a", "g", "d", "f",
"g", "a", "f", "a", "b",
"g")

# Of these vectors, you can create frequency lists by writing
freq.list.a<-table(a); freq.list.b<-table(b)
rm(a); rm(b)

# How do you merge these two frequency lists without merging the two
vectors first? More specifically, if I delete a and b from your
memory,
rm(a); rm(b)
# how do you generate the following table only from freq.list.a and
freq.list.b, i.e., without any reference to a and b themselves? Before
you complain about this question as being unrealistic, consider the
possibility that you generated the frequency lists of two corpora
(here, a and b) that are so large that you cannot combine them into
one (a.and.b<-c(a, b)) and generate a frequency list of that combined
vector (table(a.and.b)) ...
joint.freqs
a b d e f g i j
3 1 3 1 5 5 1 1

joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a),
names(freq.list.b)))))) # You generate an empty vector joint.freqs (i)
that is as long as there are different types in both a and b (but note
that, as requested, this information is not taken from a or b, but
from their frequency lists) ...
names(joint.freqs)<-sort(unique(c(names(freq.list.a),
names(freq.list.b)))) # ... and (ii) whose elements have these
different types as names.
joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of the new
vector joint.freqs that have the same names as the frequencies in the
first frequency list are assigned the respective frequencies.
joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b
# The elements of the new vector joint.freqs that have the same names
as the frequencies in the second frequency list are assigned the sum
of the values they already have (either the ones from the first
frequency list or just zeroes) and the respective frequencies.
joint.freqs # look at the result

# Another shorter and more elegant solution was proposed by Claire
Crawford (but uses a function which will only be introduced later in
the book)
freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two frequency
lists are merged into a single vector ...
joint.freqs<-as.table(tapply(freq.list.a.b, names(freq.list.a.b),
sum)) # ... and then the sums of all numbers that share the same names
are computed
joint.freqs # look at the result

# The shortest, but certainly not memory-efficient way to do this
involves just using the frequency lists to create one big vector with
all elements and tabulate that.
table(c(rep(names(freq.list.a), freq.list.a), rep(names(freq.list.b),
freq.list.b))) # kind of cheating but possible with short vectors ...

HTH,
STG
--
Stefan Th. Gries
-----------------------------------------------
University of California, Santa Barbara
http://www.linguistics.ucsb.edu/faculty/stgries

Rui Barradas

2012-Sep-20 16:10 UTC

head link

[R] (no subject)

Maybe it is longer, but it's also more general, it issues an error if 
the tables are not 1-dim. That's where most of the function's extra 
lines are. Otherwise it's the same as your first solution. The second 
one has the problem you've mentioned.

Rui Barradas
Em 20-09-2012 16:46, Stefan Th. Gries escreveu:> Ye, but this is way longer than any of the three solutions I sent, is it
not?
> STG
> --
> Stefan Th. Gries
> -----------------------------------------------
> University of California, Santa Barbara
> http://www.linguistics.ucsb.edu/faculty/stgries
> -----------------------------------------------
>
>
> On Thu, Sep 20, 2012 at 8:43 AM, Rui Barradas <ruipbarradas at
sapo.pt> wrote:
>> Hello,
>>
>> The trick is to use the table's dimnames attributes. Try the
following.
>>
>> addTables <- function(t1, t2){
>>      dn1 <- dimnames(t1)
>>      dn2 <- dimnames(t2)
>>      if(length(dn1) == 1){
>>          dn1 <- unlist(dn1)
>>          dn2 <- unlist(dn2)
>>          dns <- sort(unique(c(dn1, dn2)))
>>          tsum <- array(integer(length(dns)), dim = length(dns))
>>          dimnames(tsum) <- list(dns)
>>          tsum[dn1] <- t1
>>          tsum[dn2] <- tsum[dn2] + t2
>>      }else
>>          stop(paste("table with", ndim, "dimensions is
not implemented."))
>>      tsum
>> }
>>
>>
>> a <- c("d", "d", "j", "f",
"e", "g", "f", "f", "i",
"g")
>> b <- c("a", "g", "d", "f",
"g", "a", "f", "a", "b",
"g")
>> ta <- table(a)
>> tb <- table(b)
>> rm(a, b)
>>
>> addTables(ta, tb)
>>
>> Hope this helps,
>>
>> Rui Barradas
>> Em 20-09-2012 15:57, Stefan Th. Gries escreveu:
>>> >From my book on corpus linguistics with R:
>>>
>>> # (10)   Imagine you have two vectors a and b such that
>>> a<-c("d", "d", "j", "f",
"e", "g", "f", "f", "i",
"g")
>>> b<-c("a", "g", "d", "f",
"g", "a", "f", "a", "b",
"g")
>>>
>>> # Of these vectors, you can create frequency lists by writing
>>> freq.list.a<-table(a); freq.list.b<-table(b)
>>> rm(a); rm(b)
>>>
>>> # How do you merge these two frequency lists without merging the
two
>>> vectors first? More specifically, if I delete a and b from your
>>> memory,
>>> rm(a); rm(b)
>>> # how do you generate the following table only from freq.list.a and
>>> freq.list.b, i.e., without any reference to a and b themselves?
Before
>>> you complain about this question as being unrealistic, consider the
>>> possibility that you generated the frequency lists of two corpora
>>> (here, a and b) that are so large that you cannot combine them into
>>> one (a.and.b<-c(a, b)) and generate a frequency list of that
combined
>>> vector (table(a.and.b)) ...
>>> joint.freqs
>>> a b d e f g i j
>>> 3 1 3 1 5 5 1 1
>>>
>>>
joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a),
>>> names(freq.list.b)))))) # You generate an empty vector joint.freqs
(i)
>>> that is as long as there are different types in both a and b (but
note
>>> that, as requested, this information is not taken from a or b, but
>>> from their frequency lists) ...
>>> names(joint.freqs)<-sort(unique(c(names(freq.list.a),
>>> names(freq.list.b)))) # ... and (ii) whose elements have these
>>> different types as names.
>>> joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of
the new
>>> vector joint.freqs that have the same names as the frequencies in
the
>>> first frequency list are assigned the respective frequencies.
>>>
>>>
joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b
>>> # The elements of the new vector joint.freqs that have the same
names
>>> as the frequencies in the second frequency list are assigned the
sum
>>> of the values they already have (either the ones from the first
>>> frequency list or just zeroes) and the respective frequencies.
>>> joint.freqs # look at the result
>>>
>>> # Another shorter and more elegant solution was proposed by Claire
>>> Crawford (but uses a function which will only be introduced later
in
>>> the book)
>>> freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two
frequency
>>> lists are merged into a single vector ...
>>> joint.freqs<-as.table(tapply(freq.list.a.b,
names(freq.list.a.b),
>>> sum)) # ... and then the sums of all numbers that share the same
names
>>> are computed
>>> joint.freqs # look at the result
>>>
>>> # The shortest, but certainly not memory-efficient way to do this
>>> involves just using the frequency lists to create one big vector
with
>>> all elements and tabulate that.
>>> table(c(rep(names(freq.list.a), freq.list.a),
rep(names(freq.list.b),
>>> freq.list.b))) # kind of cheating but possible with short vectors
...
>>>
>>> HTH,
>>> STG
>>> --
>>> Stefan Th. Gries
>>> -----------------------------------------------
>>> University of California, Santa Barbara
>>> http://www.linguistics.ucsb.edu/faculty/stgries
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>

Gabor Grothendieck

2012-Sep-20 16:13 UTC

head link

[R] (no subject)

On Thu, Sep 20, 2012 at 10:57 AM, Stefan Th. Gries <stgries at gmail.com>
wrote:> >From my book on corpus linguistics with R:
>
> # (10)   Imagine you have two vectors a and b such that
> a<-c("d", "d", "j", "f",
"e", "g", "f", "f", "i",
"g")
> b<-c("a", "g", "d", "f",
"g", "a", "f", "a", "b",
"g")
>
> # Of these vectors, you can create frequency lists by writing
> freq.list.a<-table(a); freq.list.b<-table(b)
> rm(a); rm(b)
>
> # How do you merge these two frequency lists without merging the two
> vectors first? More specifically, if I delete a and b from your
> memory,
> rm(a); rm(b)
> # how do you generate the following table only from freq.list.a and
> freq.list.b, i.e., without any reference to a and b themselves? Before
> you complain about this question as being unrealistic, consider the
> possibility that you generated the frequency lists of two corpora
> (here, a and b) that are so large that you cannot combine them into
> one (a.and.b<-c(a, b)) and generate a frequency list of that combined
> vector (table(a.and.b)) ...
> joint.freqs
> a b d e f g i j
> 3 1 3 1 5 5 1 1
>
> joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a),
> names(freq.list.b)))))) # You generate an empty vector joint.freqs (i)
> that is as long as there are different types in both a and b (but note
> that, as requested, this information is not taken from a or b, but
> from their frequency lists) ...
> names(joint.freqs)<-sort(unique(c(names(freq.list.a),
> names(freq.list.b)))) # ... and (ii) whose elements have these
> different types as names.
> joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of the new
> vector joint.freqs that have the same names as the frequencies in the
> first frequency list are assigned the respective frequencies.
>
joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b
> # The elements of the new vector joint.freqs that have the same names
> as the frequencies in the second frequency list are assigned the sum
> of the values they already have (either the ones from the first
> frequency list or just zeroes) and the respective frequencies.
> joint.freqs # look at the result
>
> # Another shorter and more elegant solution was proposed by Claire
> Crawford (but uses a function which will only be introduced later in
> the book)
> freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two frequency
> lists are merged into a single vector ...
> joint.freqs<-as.table(tapply(freq.list.a.b, names(freq.list.a.b),
> sum)) # ... and then the sums of all numbers that share the same names
> are computed
> joint.freqs # look at the result
>
> # The shortest, but certainly not memory-efficient way to do this
> involves just using the frequency lists to create one big vector with
> all elements and tabulate that.
> table(c(rep(names(freq.list.a), freq.list.a), rep(names(freq.list.b),
> freq.list.b))) # kind of cheating but possible with short vectors ...
>
Try:

rowsum(freq.list.a.b, names(freq.list.a.b))

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

arun

2012-Sep-20 17:10 UTC

head link

[R] (no subject)

HI Stefan,
Thanks for the solutions.

Just to add 1 more:
f.a<-table(a); f.b<-table(b)
c(f.a[!names(f.a)%in%names(f.b)],f.b[!names(f.b)%in%names(f.a)],xtabs(f.a[names(f.a)%in%names(f.b)]+f.b[names(f.b)%in%names(f.a)]~
names(f.a[names(f.a)%in%names(f.b)])))

#e i j a b d f g 
#1 1 1 3 1 3 5 5 

A.K.



----- Original Message -----
From: Stefan Th. Gries <stgries at gmail.com>
To: mcelis at lightminersystems.com
Cc: r-help at r-project.org
Sent: Thursday, September 20, 2012 10:57 AM
Subject: [R] (no subject)
>From my book on corpus linguistics with R:
# (10)?  Imagine you have two vectors a and b such that
a<-c("d", "d", "j", "f",
"e", "g", "f", "f", "i",
"g")
b<-c("a", "g", "d", "f",
"g", "a", "f", "a", "b",
"g")

# Of these vectors, you can create frequency lists by writing
freq.list.a<-table(a); freq.list.b<-table(b)
rm(a); rm(b)

# How do you merge these two frequency lists without merging the two
vectors first? More specifically, if I delete a and b from your
memory,
rm(a); rm(b)
# how do you generate the following table only from freq.list.a and
freq.list.b, i.e., without any reference to a and b themselves? Before
you complain about this question as being unrealistic, consider the
possibility that you generated the frequency lists of two corpora
(here, a and b) that are so large that you cannot combine them into
one (a.and.b<-c(a, b)) and generate a frequency list of that combined
vector (table(a.and.b)) ...
joint.freqs
a b d e f g i j
3 1 3 1 5 5 1 1

joint.freqs<-vector(length=length(sort(unique(c(names(freq.list.a),
names(freq.list.b)))))) # You generate an empty vector joint.freqs (i)
that is as long as there are different types in both a and b (but note
that, as requested, this information is not taken from a or b, but
from their frequency lists) ...
names(joint.freqs)<-sort(unique(c(names(freq.list.a),
names(freq.list.b)))) # ... and (ii) whose elements have these
different types as names.
joint.freqs[names(freq.list.a)]<-freq.list.a # The elements of the new
vector joint.freqs that have the same names as the frequencies in the
first frequency list are assigned the respective frequencies.
joint.freqs[names(freq.list.b)]<-joint.freqs[names(freq.list.b)]+freq.list.b
# The elements of the new vector joint.freqs that have the same names
as the frequencies in the second frequency list are assigned the sum
of the values they already have (either the ones from the first
frequency list or just zeroes) and the respective frequencies.
joint.freqs # look at the result

# Another shorter and more elegant solution was proposed by Claire
Crawford (but uses a function which will only be introduced later in
the book)
freq.list.a.b<-c(freq.list.a, freq.list.b) # first the two frequency
lists are merged into a single vector ...
joint.freqs<-as.table(tapply(freq.list.a.b, names(freq.list.a.b),
sum)) # ... and then the sums of all numbers that share the same names
are computed
joint.freqs # look at the result

# The shortest, but certainly not memory-efficient way to do this
involves just using the frequency lists to create one big vector with
all elements and tabulate that.
table(c(rep(names(freq.list.a), freq.list.a), rep(names(freq.list.b),
freq.list.b))) # kind of cheating but possible with short vectors ...

HTH,
STG
--
Stefan Th. Gries
-----------------------------------------------
University of California, Santa Barbara
http://www.linguistics.ucsb.edu/faculty/stgries

______________________________________________
R-help at r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Seemingly Similar Threads

Search for more possibly parallel threads

R help - Sep 2012 - (no subject)

[R] (no subject)

[R] (no subject)

[R] (no subject)

[R] (no subject)

Seemingly Similar Threads