thr3ads.net - R help - [R] counting row repetitions without loop [Feb 2008]

If this information is useful, please help other people find it:
Share via:

Waterman, DG (David)

2008-Feb-06 14:08 UTC

[R] counting row repetitions without loop

Hi,
 
I have a data frame consisting of coordinates on a 10*10 grid, i.e.
 > example    x  y
1   4  5
2   6  7
3   6  6
4   7  5
5   5  7
6   6  7
7   4  5
8   6  7
9   7  6
10  5  6

What I would like to do is return an 10*10 matrix consisting of counts
at each position, so in the above example I would have a matrix where,
for example, cell [4,5] contains 2 and [6,7] contains 3. At the moment I
have implemented this using a for loop over the rows of the data frame,
however the data frames I want to process are very long so the loop
takes many minutes to complete. Can I do this in a more efficient way?
 
Cheers,
David
<DIV><FONT size="1" color="gray">This e-mail and
any attachments may contain confidential, copyright and or privileged material,
and are for the use of the intended addressee only. If you are not the intended
addressee or an authorised recipient of the addressee please notify us of
receipt by returning the e-mail and do not use, copy, retain, distribute or
disclose the information in or attached to the e-mail.
Any opinions expressed within this e-mail are those of the individual and not
necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments
are free from viruses and we cannot accept liability for any damage which you
may sustain as a result of software viruses which may be transmitted in or with
the message.
Diamond Light Source Limited (company no. 4375679). Registered in England and
Wales with its registered office at Diamond House, Harwell Science and
Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
</FONT></DIV>

James Foadi

2008-Feb-06 14:53 UTC

head link

[R] counting row repetitions without loop

On Wednesday 06 February 2008 14:08, Waterman, DG (David)
wrote:> Hi,
>
> I have a data frame consisting of coordinates on a 10*10 grid, i.e.
>
> > example
>
>     x  y
> 1   4  5
> 2   6  7
> 3   6  6
> 4   7  5
> 5   5  7
> 6   6  7
> 7   4  5
> 8   6  7
> 9   7  6
> 10  5  6
>
> What I would like to do is return an 10*10 matrix consisting of counts
> at each position, so in the above example I would have a matrix where,
> for example, cell [4,5] contains 2 and [6,7] contains 3. At the moment I
> have implemented this using a for loop over the rows of the data frame,
> however the data frames I want to process are very long so the loop
> takes many minutes to complete. Can I do this in a more efficient way?
>
> Cheers,

David,
have a look at "mapply" (?mapply). This does what you need very
quickly.

J

Doran, Harold

2008-Feb-06 14:55 UTC

head link

[R] counting row repetitions without loop

I think this does what you want, but there may be a more efficient way

x  y
4  5
6  7
6  6
7  5
5  7
6  7
4  5
6  7
7  6
5  6
dat <- read.table('clipboard', header=TRUE) # copy sample data above
dat$patt <- paste(dat$x,dat$y, sep='')
mm <- as.data.frame(with(dat, table(patt)))
dat <- merge(dat, mm, by='patt')
mat <- matrix(0, ncol=10, nrow=10)
gg <- matrix(c(dat$x, dat$y), ncol=2)
mat[gg] <- dat$Freq 
> -----Original Message-----
> From: r-help-bounces at r-project.org 
> [mailto:r-help-bounces at r-project.org] On Behalf Of Waterman, 
> DG (David)
> Sent: Wednesday, February 06, 2008 9:08 AM
> To: r-help at r-project.org
> Subject: [R] counting row repetitions without loop
> 
> Hi,
>  
> I have a data frame consisting of coordinates on a 10*10 grid, i.e.
>  
> > example
>     x  y
> 1   4  5
> 2   6  7
> 3   6  6
> 4   7  5
> 5   5  7
> 6   6  7
> 7   4  5
> 8   6  7
> 9   7  6
> 10  5  6
> 
> What I would like to do is return an 10*10 matrix consisting 
> of counts at each position, so in the above example I would 
> have a matrix where, for example, cell [4,5] contains 2 and 
> [6,7] contains 3. At the moment I have implemented this using 
> a for loop over the rows of the data frame, however the data 
> frames I want to process are very long so the loop takes many 
> minutes to complete. Can I do this in a more efficient way?
>  
> Cheers,
> David
> <DIV><FONT size="1" color="gray">This
e-mail and any
> attachments may contain confidential, copyright and or 
> privileged material, and are for the use of the intended 
> addressee only. If you are not the intended addressee or an 
> authorised recipient of the addressee please notify us of 
> receipt by returning the e-mail and do not use, copy, retain, 
> distribute or disclose the information in or attached to the e-mail.
> Any opinions expressed within this e-mail are those of the 
> individual and not necessarily of Diamond Light Source Ltd. 
> Diamond Light Source Ltd. cannot guarantee that this e-mail 
> or any attachments are free from viruses and we cannot accept 
> liability for any damage which you may sustain as a result of 
> software viruses which may be transmitted in or with the message.
> Diamond Light Source Limited (company no. 4375679). 
> Registered in England and Wales with its registered office at 
> Diamond House, Harwell Science and Innovation Campus, Didcot, 
> Oxfordshire, OX11 0DE, United Kingdom </FONT></DIV> 
> 
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Douglas Bates

2008-Feb-06 19:15 UTC

head link

[R] counting row repetitions without loop

On Feb 6, 2008 8:08 AM, Waterman, DG (David)
<david.waterman at diamond.ac.uk> wrote:> Hi,
> I have a data frame consisting of coordinates on a 10*10 grid, i.e.
> > example
>     x  y
> 1   4  5
> 2   6  7
> 3   6  6
> 4   7  5
> 5   5  7
> 6   6  7
> 7   4  5
> 8   6  7
> 9   7  6
> 10  5  6
> What I would like to do is return an 10*10 matrix consisting of counts
> at each position, so in the above example I would have a matrix where,
> for example, cell [4,5] contains 2 and [6,7] contains 3. At the moment I
> have implemented this using a for loop over the rows of the data frame,
> however the data frames I want to process are very long so the loop
> takes many minutes to complete. Can I do this in a more efficient way?
What you are describing is essentially a cross-tabulation so you could use
> examp   x y
1  4 5
2  6 7
3  6 6
4  7 5
5  5 7
6  6 7
7  4 5
8  6 7
9  7 6
10 5 6> xtabs(~ x + y, examp)   y
x   5 6 7
  4 2 0 0
  5 0 1 1
  6 0 1 3
  7 1 1 0

This omits the rows and columns which are completely empty but you can
work around that.

If you have a very large collection of such pairs to summarize you
could consider the version of xtabs in the Matrix package that allows
for the argument sparse = TRUE.  That uses conversion of the "triplet"
form of a sparse matrix to the compressed column for to do the
counting.

If you want to do this without converting the integers in 'x' and
'y'
to factors you can use a distinctly unobvious function like

library(Matrix)
sparsetab <- function(x, y)
{
    x <- as.integer(x)
    y <- as.integer(y)
    stopifnot(length(x) == length(y))
    lx <- length(x)
    mx <- max(x)
    my <- max(y)
    as(new("dgTMatrix", i = x - 1L, j = y - 1L,
           x = rep(1, length(x)), Dim = c(mx, my),
           Dimnames = list(1:mx,1:my)), "dgCMatrix")
}

which produces
> with(examp, sparsetab(x, y))7 x 7 sparse Matrix of class "dgCMatrix"
  1 2 3 4 5 6 7
1 . . . . . . .
2 . . . . . . .
3 . . . . . . .
4 . . . . 2 . .
5 . . . . . 1 1
6 . . . . . 1 3
7 . . . . 1 1 .

One reason to use such a function instead of xtabs is because xtabs
will convert 'x' and 'y' to factors and the default ordering of
the
levels is lexicographic so '11' occurs before '2'.  Again, you
can get
around that but the function shown above is more direct and should be
fast enough for most any application.

Reasonably Related Threads

Search for more possibly parallel threads

R help - Feb 2008 - counting row repetitions without loop

[R] counting row repetitions without loop

[R] counting row repetitions without loop

[R] counting row repetitions without loop

[R] counting row repetitions without loop

Reasonably Related Threads