thr3ads.net - R help - [R] tapply [Jun 2005]

If this information is useful, please help other people find it:
Share via:

Weiwei Shi

2005-Jun-20 23:15 UTC

[R] tapply

hi,
i have another question on tapply:
i have a dataset z like this:
5540 389100307391      2600
5541 389100307391      2600
5542 389100307391      2600
5543 389100307391      2600
5544 389100307391      2600
5546 381300302513        NA
5547 387000307470        NA
5548 387000307470        NA
5549 387000307470        NA
5550 387000307470        NA
5551 387000307470        NA
5552 387000307470        NA

I want to sum the column 3 by column 2.
I removed NA by calling:
tapply(z[[3]], z[[2]], sum, na.rm=T)
but it does not work.

then, i used
z1<-z[!is.na(z[[3]],]
and repeat
still doesn't work.

please help.

-- 
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

Jim Brennan

2005-Jun-20 23:42 UTC

head link

[R] tapply

This may help
R>wei
     V1           V2   V3
1  5540 389100307391 2600
2  5541 389100307391 2600
3  5542 389100307391 2600
4  5543 389100307391 2600
5  5544 389100307391 2600
6  5546 381300302513   NA
7  5547 387000307470   NA
8  5548 387000307470   NA
9  5549 387000307470   NA
10 5550 387000307470   NA
11 5551 387000307470   NA
12 5552 387000307470   NA
R>ave(wei[,3],wei[,2],FUN=sum)
 [1] 13000 13000 13000 13000 13000    NA    NA    NA    NA    NA    NA    NA
R>

-----Original Message-----
From: r-help-bounces at stat.math.ethz.ch
[mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Weiwei Shi
Sent: June 20, 2005 7:16 PM
To: R-help at stat.math.ethz.ch
Subject: [R] tapply

hi,
i have another question on tapply:
i have a dataset z like this:
5540 389100307391      2600
5541 389100307391      2600
5542 389100307391      2600
5543 389100307391      2600
5544 389100307391      2600
5546 381300302513        NA
5547 387000307470        NA
5548 387000307470        NA
5549 387000307470        NA
5550 387000307470        NA
5551 387000307470        NA
5552 387000307470        NA

I want to sum the column 3 by column 2.
I removed NA by calling:
tapply(z[[3]], z[[2]], sum, na.rm=T)
but it does not work.

then, i used
z1<-z[!is.na(z[[3]],]
and repeat
still doesn't work.

please help.

-- 
Weiwei Shi, Ph.D

"Did you always know?"
"No, I did not. But I believed..."
---Matrix III

______________________________________________
R-help at stat.math.ethz.ch mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide!
http://www.R-project.org/posting-guide.html

Marc Schwartz

2005-Jun-20 23:46 UTC

head link

[R] tapply

On Mon, 2005-06-20 at 18:15 -0500, Weiwei Shi wrote:> hi,
> i have another question on tapply:
> i have a dataset z like this:
> 5540 389100307391      2600
> 5541 389100307391      2600
> 5542 389100307391      2600
> 5543 389100307391      2600
> 5544 389100307391      2600
> 5546 381300302513        NA
> 5547 387000307470        NA
> 5548 387000307470        NA
> 5549 387000307470        NA
> 5550 387000307470        NA
> 5551 387000307470        NA
> 5552 387000307470        NA
> 
> I want to sum the column 3 by column 2.
> I removed NA by calling:
> tapply(z[[3]], z[[2]], sum, na.rm=T)
> but it does not work.
> 
> then, i used
> z1<-z[!is.na(z[[3]],]
> and repeat
> still doesn't work.
> 
> please help.

The index vector(s) in tapply() need to be a "list". See the
description
of the INDEX argument in ?tapply:
> tapply(z[[3]],list(z[[2]]), sum, na.rm = TRUE)381300302513 387000307470 389100307391 
           0            0        13000 


Note that the use of na.rm = TRUE here results in misleading values of 0
for the other two groups, which are all NA's and this is not
self-evident unless you know the data.

You may be better off with:
> tapply(z[[3]],list(z[[2]]), sum)381300302513 387000307470 389100307391 
          NA           NA        13000 

unless your real data is a mix of NA's and measured values.

Also see ?complete.cases and ?na.omit for further approaches to dealing
with such data sets.

HTH,

Marc Schwartz

Douglas Bates

2005-Jun-20 23:49 UTC

head link

[R] tapply

On 6/20/05, Weiwei Shi <helprhelp at gmail.com>
wrote:> hi,
> i have another question on tapply:
> i have a dataset z like this:
> 5540 389100307391      2600
> 5541 389100307391      2600
> 5542 389100307391      2600
> 5543 389100307391      2600
> 5544 389100307391      2600
> 5546 381300302513        NA
> 5547 387000307470        NA
> 5548 387000307470        NA
> 5549 387000307470        NA
> 5550 387000307470        NA
> 5551 387000307470        NA
> 5552 387000307470        NA
> 
> I want to sum the column 3 by column 2.
> I removed NA by calling:
> tapply(z[[3]], z[[2]], sum, na.rm=T)
> but it does not work.
> 
> then, i used
> z1<-z[!is.na(z[[3]],]
> and repeat
> still doesn't work.
Can you be more explicit about "doesn't work"?

Gabor Grothendieck

2005-Jun-21 02:26 UTC

head link

[R] tapply

On 6/20/05, Weiwei Shi <helprhelp at gmail.com>
wrote:> hi,
> i have another question on tapply:
> i have a dataset z like this:
> 5540 389100307391      2600
> 5541 389100307391      2600
> 5542 389100307391      2600
> 5543 389100307391      2600
> 5544 389100307391      2600
> 5546 381300302513        NA
> 5547 387000307470        NA
> 5548 387000307470        NA
> 5549 387000307470        NA
> 5550 387000307470        NA
> 5551 387000307470        NA
> 5552 387000307470        NA
> 
> I want to sum the column 3 by column 2.
> I removed NA by calling:
> tapply(z[[3]], z[[2]], sum, na.rm=T)
> but it does not work.
> 
> then, i used
> z1<-z[!is.na(z[[3]],]
> and repeat
> still doesn't work.
> 
> please help.
> 
Depending on what you want you may be able to use rowsum:

- display only groups that have at least one non-NA with the sum
  being the sum of the non-NAs:

	with(na.omit(z), rowsum(V3, V2))

- display all groups with the sum being NA if any member is NA:

	rowsum(z$V3, z$V2)

Liaw, Andy

2005-Jun-21 16:47 UTC

head link

[R] tapply

What does str(z) say?  I suspect the second column is a factor, which, after
the subsetting, has some empty levels.  If so, just drop those levels.

Andy
> From: Weiwei Shi
> 
> hi
> i tried all the methods suggested above:
> ave and rowsum with "with" function works for my situation. I
think
> the problem might not be due to tapply.
> My data z comes from
> z<-y[y[[1]] %in% x[[2]], c(1,9)]
> 
> while z is supposed to have no entries for those non-matched 
> between x and y.
> 
> however, when I run tapply, and the result also includes those
> non-matched entries. I use is.na function to remove those entry from z
> first and then use tapply again, but the result is the same: those
> NA's and those non-matched results are still there. That's what I
mean
> by "it doesn't work".
> 
> Is there something I missed here so that z "implicitly" has some
> "trace" back to y dataset?
> 
> thanks,
> 
> On 6/20/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote:
> > On 6/20/05, Weiwei Shi <helprhelp at gmail.com> wrote:
> > > hi,
> > > i have another question on tapply:
> > > i have a dataset z like this:
> > > 5540 389100307391      2600
> > > 5541 389100307391      2600
> > > 5542 389100307391      2600
> > > 5543 389100307391      2600
> > > 5544 389100307391      2600
> > > 5546 381300302513        NA
> > > 5547 387000307470        NA
> > > 5548 387000307470        NA
> > > 5549 387000307470        NA
> > > 5550 387000307470        NA
> > > 5551 387000307470        NA
> > > 5552 387000307470        NA
> > >
> > > I want to sum the column 3 by column 2.
> > > I removed NA by calling:
> > > tapply(z[[3]], z[[2]], sum, na.rm=T)
> > > but it does not work.
> > >
> > > then, i used
> > > z1<-z[!is.na(z[[3]],]
> > > and repeat
> > > still doesn't work.
> > >
> > > please help.
> > >
> > 
> > Depending on what you want you may be able to use rowsum:
> > 
> > - display only groups that have at least one non-NA with the sum
> >   being the sum of the non-NAs:
> > 
> >         with(na.omit(z), rowsum(V3, V2))
> > 
> > - display all groups with the sum being NA if any member is NA:
> > 
> >         rowsum(z$V3, z$V2)
> > 
> 
> 
> -- 
> Weiwei Shi, Ph.D
> 
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
>

Liaw, Andy

2005-Jun-21 17:30 UTC

head link

[R] tapply

Try:
> (x <- factor(1:2, levels=1:5))[1] 1 2
Levels: 1 2 3 4 5> (x <- x[, drop=TRUE])[1] 1 2
Levels: 1 2

Andy
> From: Weiwei Shi [mailto:helprhelp at gmail.com] 
> 
> Even before I tried, I already realize it must be true when I read
> this reply! Great job! thanks, Andy.
> 
> > str(z)
> `data.frame':   235 obs. of  2 variables:
>  $ CLAIMNUM : Factor w/ 1907 levels
"0","10000001849",..: 1083 1083
> 1083 1582 1582 1084 1681 1681 1391 1391 ...
>  $ SIU.SAVED: int  475 3000 3000 0 0 4352 0 0 4500 3000 ...
> 
> So, I have another general question: how to avoid this when I 
> do the matching?
> In my case, claimnum does not have to be a factor.  I think I can do
> as.integer on it to de-factor it. But, I want to know how to do it w/
> keeping is as factor? btw, what's your way to drop those levels?  :)
> 
> weiwei 
> 
> 
> On 6/21/05, Liaw, Andy <andy_liaw at merck.com> wrote:
> > What does str(z) say?  I suspect the second column is a 
> factor, which, after
> > the subsetting, has some empty levels.  If so, just drop 
> those levels.
> > 
> > Andy
> > 
> > > From: Weiwei Shi
> > >
> > > hi
> > > i tried all the methods suggested above:
> > > ave and rowsum with "with" function works for my 
> situation. I think
> > > the problem might not be due to tapply.
> > > My data z comes from
> > > z<-y[y[[1]] %in% x[[2]], c(1,9)]
> > >
> > > while z is supposed to have no entries for those non-matched
> > > between x and y.
> > >
> > > however, when I run tapply, and the result also includes those
> > > non-matched entries. I use is.na function to remove those 
> entry from z
> > > first and then use tapply again, but the result is the same:
those
> > > NA's and those non-matched results are still there. 
> That's what I mean
> > > by "it doesn't work".
> > >
> > > Is there something I missed here so that z "implicitly"
has some
> > > "trace" back to y dataset?
> > >
> > > thanks,
> > >
> > > On 6/20/05, Gabor Grothendieck <ggrothendieck at gmail.com>
wrote:
> > > > On 6/20/05, Weiwei Shi <helprhelp at gmail.com> wrote:
> > > > > hi,
> > > > > i have another question on tapply:
> > > > > i have a dataset z like this:
> > > > > 5540 389100307391      2600
> > > > > 5541 389100307391      2600
> > > > > 5542 389100307391      2600
> > > > > 5543 389100307391      2600
> > > > > 5544 389100307391      2600
> > > > > 5546 381300302513        NA
> > > > > 5547 387000307470        NA
> > > > > 5548 387000307470        NA
> > > > > 5549 387000307470        NA
> > > > > 5550 387000307470        NA
> > > > > 5551 387000307470        NA
> > > > > 5552 387000307470        NA
> > > > >
> > > > > I want to sum the column 3 by column 2.
> > > > > I removed NA by calling:
> > > > > tapply(z[[3]], z[[2]], sum, na.rm=T)
> > > > > but it does not work.
> > > > >
> > > > > then, i used
> > > > > z1<-z[!is.na(z[[3]],]
> > > > > and repeat
> > > > > still doesn't work.
> > > > >
> > > > > please help.
> > > > >
> > > >
> > > > Depending on what you want you may be able to use rowsum:
> > > >
> > > > - display only groups that have at least one non-NA with the
sum
> > > >   being the sum of the non-NAs:
> > > >
> > > >         with(na.omit(z), rowsum(V3, V2))
> > > >
> > > > - display all groups with the sum being NA if any member is
NA:
> > > >
> > > >         rowsum(z$V3, z$V2)
> > > >
> > >
> > >
> > > --
> > > Weiwei Shi, Ph.D
> > >
> > > "Did you always know?"
> > > "No, I did not. But I believed..."
> > > ---Matrix III
> > >
> > > ______________________________________________
> > > R-help at stat.math.ethz.ch mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide!
> > > http://www.R-project.org/posting-guide.html
> > >
> > >
> > >
> > 
> > 
> > 
> > 
> --------------------------------------------------------------
> ----------------
> > Notice:  This e-mail message, together with any 
> attachments, contains information of Merck & Co., Inc. (One 
> Merck Drive, Whitehouse Station, New Jersey, USA 08889), 
> and/or its affiliates (which may be known outside the United 
> States as Merck Frosst, Merck Sharp & Dohme or MSD and in 
> Japan, as Banyu) that may be confidential, proprietary 
> copyrighted and/or legally privileged. It is intended solely 
> for the use of the individual or entity named on this 
> message.  If you are not the intended recipient, and have 
> received this message in error, please notify us immediately 
> by reply e-mail and then delete it from your system.
> > 
> --------------------------------------------------------------
> ----------------
> > 
> 
> 
> -- 
> Weiwei Shi, Ph.D
> 
> "Did you always know?"
> "No, I did not. But I believed..."
> ---Matrix III
> 
> 
>

Possibly Parallel Threads

Search for more seemingly similar threads

R help - Jun 2005 - tapply

[R] tapply

[R] tapply

[R] tapply

[R] tapply

[R] tapply

[R] tapply

[R] tapply

Possibly Parallel Threads