hi, i have another question on tapply: i have a dataset z like this: 5540 389100307391 2600 5541 389100307391 2600 5542 389100307391 2600 5543 389100307391 2600 5544 389100307391 2600 5546 381300302513 NA 5547 387000307470 NA 5548 387000307470 NA 5549 387000307470 NA 5550 387000307470 NA 5551 387000307470 NA 5552 387000307470 NA I want to sum the column 3 by column 2. I removed NA by calling: tapply(z[[3]], z[[2]], sum, na.rm=T) but it does not work. then, i used z1<-z[!is.na(z[[3]],] and repeat still doesn't work. please help. -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..." ---Matrix III
This may help R>wei V1 V2 V3 1 5540 389100307391 2600 2 5541 389100307391 2600 3 5542 389100307391 2600 4 5543 389100307391 2600 5 5544 389100307391 2600 6 5546 381300302513 NA 7 5547 387000307470 NA 8 5548 387000307470 NA 9 5549 387000307470 NA 10 5550 387000307470 NA 11 5551 387000307470 NA 12 5552 387000307470 NA R>ave(wei[,3],wei[,2],FUN=sum) [1] 13000 13000 13000 13000 13000 NA NA NA NA NA NA NA R> -----Original Message----- From: r-help-bounces at stat.math.ethz.ch [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of Weiwei Shi Sent: June 20, 2005 7:16 PM To: R-help at stat.math.ethz.ch Subject: [R] tapply hi, i have another question on tapply: i have a dataset z like this: 5540 389100307391 2600 5541 389100307391 2600 5542 389100307391 2600 5543 389100307391 2600 5544 389100307391 2600 5546 381300302513 NA 5547 387000307470 NA 5548 387000307470 NA 5549 387000307470 NA 5550 387000307470 NA 5551 387000307470 NA 5552 387000307470 NA I want to sum the column 3 by column 2. I removed NA by calling: tapply(z[[3]], z[[2]], sum, na.rm=T) but it does not work. then, i used z1<-z[!is.na(z[[3]],] and repeat still doesn't work. please help. -- Weiwei Shi, Ph.D "Did you always know?" "No, I did not. But I believed..." ---Matrix III ______________________________________________ R-help at stat.math.ethz.ch mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html
On Mon, 2005-06-20 at 18:15 -0500, Weiwei Shi wrote:> hi, > i have another question on tapply: > i have a dataset z like this: > 5540 389100307391 2600 > 5541 389100307391 2600 > 5542 389100307391 2600 > 5543 389100307391 2600 > 5544 389100307391 2600 > 5546 381300302513 NA > 5547 387000307470 NA > 5548 387000307470 NA > 5549 387000307470 NA > 5550 387000307470 NA > 5551 387000307470 NA > 5552 387000307470 NA > > I want to sum the column 3 by column 2. > I removed NA by calling: > tapply(z[[3]], z[[2]], sum, na.rm=T) > but it does not work. > > then, i used > z1<-z[!is.na(z[[3]],] > and repeat > still doesn't work. > > please help.The index vector(s) in tapply() need to be a "list". See the description of the INDEX argument in ?tapply:> tapply(z[[3]],list(z[[2]]), sum, na.rm = TRUE)381300302513 387000307470 389100307391 0 0 13000 Note that the use of na.rm = TRUE here results in misleading values of 0 for the other two groups, which are all NA's and this is not self-evident unless you know the data. You may be better off with:> tapply(z[[3]],list(z[[2]]), sum)381300302513 387000307470 389100307391 NA NA 13000 unless your real data is a mix of NA's and measured values. Also see ?complete.cases and ?na.omit for further approaches to dealing with such data sets. HTH, Marc Schwartz
On 6/20/05, Weiwei Shi <helprhelp at gmail.com> wrote:> hi, > i have another question on tapply: > i have a dataset z like this: > 5540 389100307391 2600 > 5541 389100307391 2600 > 5542 389100307391 2600 > 5543 389100307391 2600 > 5544 389100307391 2600 > 5546 381300302513 NA > 5547 387000307470 NA > 5548 387000307470 NA > 5549 387000307470 NA > 5550 387000307470 NA > 5551 387000307470 NA > 5552 387000307470 NA > > I want to sum the column 3 by column 2. > I removed NA by calling: > tapply(z[[3]], z[[2]], sum, na.rm=T) > but it does not work. > > then, i used > z1<-z[!is.na(z[[3]],] > and repeat > still doesn't work.Can you be more explicit about "doesn't work"?
On 6/20/05, Weiwei Shi <helprhelp at gmail.com> wrote:> hi, > i have another question on tapply: > i have a dataset z like this: > 5540 389100307391 2600 > 5541 389100307391 2600 > 5542 389100307391 2600 > 5543 389100307391 2600 > 5544 389100307391 2600 > 5546 381300302513 NA > 5547 387000307470 NA > 5548 387000307470 NA > 5549 387000307470 NA > 5550 387000307470 NA > 5551 387000307470 NA > 5552 387000307470 NA > > I want to sum the column 3 by column 2. > I removed NA by calling: > tapply(z[[3]], z[[2]], sum, na.rm=T) > but it does not work. > > then, i used > z1<-z[!is.na(z[[3]],] > and repeat > still doesn't work. > > please help. >Depending on what you want you may be able to use rowsum: - display only groups that have at least one non-NA with the sum being the sum of the non-NAs: with(na.omit(z), rowsum(V3, V2)) - display all groups with the sum being NA if any member is NA: rowsum(z$V3, z$V2)
What does str(z) say? I suspect the second column is a factor, which, after the subsetting, has some empty levels. If so, just drop those levels. Andy> From: Weiwei Shi > > hi > i tried all the methods suggested above: > ave and rowsum with "with" function works for my situation. I think > the problem might not be due to tapply. > My data z comes from > z<-y[y[[1]] %in% x[[2]], c(1,9)] > > while z is supposed to have no entries for those non-matched > between x and y. > > however, when I run tapply, and the result also includes those > non-matched entries. I use is.na function to remove those entry from z > first and then use tapply again, but the result is the same: those > NA's and those non-matched results are still there. That's what I mean > by "it doesn't work". > > Is there something I missed here so that z "implicitly" has some > "trace" back to y dataset? > > thanks, > > On 6/20/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote: > > On 6/20/05, Weiwei Shi <helprhelp at gmail.com> wrote: > > > hi, > > > i have another question on tapply: > > > i have a dataset z like this: > > > 5540 389100307391 2600 > > > 5541 389100307391 2600 > > > 5542 389100307391 2600 > > > 5543 389100307391 2600 > > > 5544 389100307391 2600 > > > 5546 381300302513 NA > > > 5547 387000307470 NA > > > 5548 387000307470 NA > > > 5549 387000307470 NA > > > 5550 387000307470 NA > > > 5551 387000307470 NA > > > 5552 387000307470 NA > > > > > > I want to sum the column 3 by column 2. > > > I removed NA by calling: > > > tapply(z[[3]], z[[2]], sum, na.rm=T) > > > but it does not work. > > > > > > then, i used > > > z1<-z[!is.na(z[[3]],] > > > and repeat > > > still doesn't work. > > > > > > please help. > > > > > > > Depending on what you want you may be able to use rowsum: > > > > - display only groups that have at least one non-NA with the sum > > being the sum of the non-NAs: > > > > with(na.omit(z), rowsum(V3, V2)) > > > > - display all groups with the sum being NA if any member is NA: > > > > rowsum(z$V3, z$V2) > > > > > -- > Weiwei Shi, Ph.D > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html > > >
Try:> (x <- factor(1:2, levels=1:5))[1] 1 2 Levels: 1 2 3 4 5> (x <- x[, drop=TRUE])[1] 1 2 Levels: 1 2 Andy> From: Weiwei Shi [mailto:helprhelp at gmail.com] > > Even before I tried, I already realize it must be true when I read > this reply! Great job! thanks, Andy. > > > str(z) > `data.frame': 235 obs. of 2 variables: > $ CLAIMNUM : Factor w/ 1907 levels "0","10000001849",..: 1083 1083 > 1083 1582 1582 1084 1681 1681 1391 1391 ... > $ SIU.SAVED: int 475 3000 3000 0 0 4352 0 0 4500 3000 ... > > So, I have another general question: how to avoid this when I > do the matching? > In my case, claimnum does not have to be a factor. I think I can do > as.integer on it to de-factor it. But, I want to know how to do it w/ > keeping is as factor? btw, what's your way to drop those levels? :) > > weiwei > > > On 6/21/05, Liaw, Andy <andy_liaw at merck.com> wrote: > > What does str(z) say? I suspect the second column is a > factor, which, after > > the subsetting, has some empty levels. If so, just drop > those levels. > > > > Andy > > > > > From: Weiwei Shi > > > > > > hi > > > i tried all the methods suggested above: > > > ave and rowsum with "with" function works for my > situation. I think > > > the problem might not be due to tapply. > > > My data z comes from > > > z<-y[y[[1]] %in% x[[2]], c(1,9)] > > > > > > while z is supposed to have no entries for those non-matched > > > between x and y. > > > > > > however, when I run tapply, and the result also includes those > > > non-matched entries. I use is.na function to remove those > entry from z > > > first and then use tapply again, but the result is the same: those > > > NA's and those non-matched results are still there. > That's what I mean > > > by "it doesn't work". > > > > > > Is there something I missed here so that z "implicitly" has some > > > "trace" back to y dataset? > > > > > > thanks, > > > > > > On 6/20/05, Gabor Grothendieck <ggrothendieck at gmail.com> wrote: > > > > On 6/20/05, Weiwei Shi <helprhelp at gmail.com> wrote: > > > > > hi, > > > > > i have another question on tapply: > > > > > i have a dataset z like this: > > > > > 5540 389100307391 2600 > > > > > 5541 389100307391 2600 > > > > > 5542 389100307391 2600 > > > > > 5543 389100307391 2600 > > > > > 5544 389100307391 2600 > > > > > 5546 381300302513 NA > > > > > 5547 387000307470 NA > > > > > 5548 387000307470 NA > > > > > 5549 387000307470 NA > > > > > 5550 387000307470 NA > > > > > 5551 387000307470 NA > > > > > 5552 387000307470 NA > > > > > > > > > > I want to sum the column 3 by column 2. > > > > > I removed NA by calling: > > > > > tapply(z[[3]], z[[2]], sum, na.rm=T) > > > > > but it does not work. > > > > > > > > > > then, i used > > > > > z1<-z[!is.na(z[[3]],] > > > > > and repeat > > > > > still doesn't work. > > > > > > > > > > please help. > > > > > > > > > > > > > Depending on what you want you may be able to use rowsum: > > > > > > > > - display only groups that have at least one non-NA with the sum > > > > being the sum of the non-NAs: > > > > > > > > with(na.omit(z), rowsum(V3, V2)) > > > > > > > > - display all groups with the sum being NA if any member is NA: > > > > > > > > rowsum(z$V3, z$V2) > > > > > > > > > > > > > -- > > > Weiwei Shi, Ph.D > > > > > > "Did you always know?" > > > "No, I did not. But I believed..." > > > ---Matrix III > > > > > > ______________________________________________ > > > R-help at stat.math.ethz.ch mailing list > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide! > > > http://www.R-project.org/posting-guide.html > > > > > > > > > > > > > > > > > > -------------------------------------------------------------- > ---------------- > > Notice: This e-mail message, together with any > attachments, contains information of Merck & Co., Inc. (One > Merck Drive, Whitehouse Station, New Jersey, USA 08889), > and/or its affiliates (which may be known outside the United > States as Merck Frosst, Merck Sharp & Dohme or MSD and in > Japan, as Banyu) that may be confidential, proprietary > copyrighted and/or legally privileged. It is intended solely > for the use of the individual or entity named on this > message. If you are not the intended recipient, and have > received this message in error, please notify us immediately > by reply e-mail and then delete it from your system. > > > -------------------------------------------------------------- > ---------------- > > > > > -- > Weiwei Shi, Ph.D > > "Did you always know?" > "No, I did not. But I believed..." > ---Matrix III > > >