Dimitri Liakhovitski
2015-Jun-16 17:56 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
Thank you, Bert. I'll be honest - I am just learning dplyr and was wondering if one could do it in dplyr. But of course your solution is perfect... On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:> Well, dplyr seems a bit of overkill as it's so simple with plain old > vapply() in base R : > > >> dat <- data.frame (a=sample(1:5,10,rep=TRUE), > + b=sample(3:7,10,rep=TRUE), > + g = sample(7:9,10,rep=TRUE)) > >> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L) > > a b g > 5 4 0 > > > > Cheers, > Bert > > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge is > certainly not wisdom." > -- Clifford Stoll > > On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> >> Hello! >> >> I have a data frame: >> >> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c >> c(1,3,4,3,5,5), >> device = c(1,1,2,2,3,3)) >> myvars = c("a", "b", "c") >> md[2,3] <- NA >> md[4,1] <- NA >> md >> >> I want to count number of 5s in each column - by device. I can do it like >> this: >> >> library(dplyr) >> group_by(md, device) %>% >> summarise(counts.a = sum(a==5, na.rm = T), >> counts.b = sum(b==5, na.rm = T), >> counts.c = sum(c==5, na.rm = T)) >> >> However, in real life I'll have tons of variables (the length of >> 'myvars' can be very large) - so that I can't specify those counts.a, >> counts.b, etc. manually - dozens of times. >> >> Does dplyr allow to run the count of 5s on all 'myvars' columns at once? >> >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >-- Dimitri Liakhovitski
Dimitri Liakhovitski
2015-Jun-16 17:58 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
Except, of course, Bert, that you forgot that it had to be done by device. Your solution ignores the device. md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5), device = c(1,1,2,2,3,3)) myvars = c("a", "b", "c") md[2,3] <- NA md[4,1] <- NA md vapply(md[myvars], function(x) sum(x==5,na.rm=TRUE),1L) But the result should be by device. On Tue, Jun 16, 2015 at 1:56 PM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> Thank you, Bert. > I'll be honest - I am just learning dplyr and was wondering if one > could do it in dplyr. > But of course your solution is perfect... > > On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >> Well, dplyr seems a bit of overkill as it's so simple with plain old >> vapply() in base R : >> >> >>> dat <- data.frame (a=sample(1:5,10,rep=TRUE), >> + b=sample(3:7,10,rep=TRUE), >> + g = sample(7:9,10,rep=TRUE)) >> >>> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L) >> >> a b g >> 5 4 0 >> >> >> >> Cheers, >> Bert >> >> Bert Gunter >> >> "Data is not information. Information is not knowledge. And knowledge is >> certainly not wisdom." >> -- Clifford Stoll >> >> On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski >> <dimitri.liakhovitski at gmail.com> wrote: >>> >>> Hello! >>> >>> I have a data frame: >>> >>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c >>> c(1,3,4,3,5,5), >>> device = c(1,1,2,2,3,3)) >>> myvars = c("a", "b", "c") >>> md[2,3] <- NA >>> md[4,1] <- NA >>> md >>> >>> I want to count number of 5s in each column - by device. I can do it like >>> this: >>> >>> library(dplyr) >>> group_by(md, device) %>% >>> summarise(counts.a = sum(a==5, na.rm = T), >>> counts.b = sum(b==5, na.rm = T), >>> counts.c = sum(c==5, na.rm = T)) >>> >>> However, in real life I'll have tons of variables (the length of >>> 'myvars' can be very large) - so that I can't specify those counts.a, >>> counts.b, etc. manually - dozens of times. >>> >>> Does dplyr allow to run the count of 5s on all 'myvars' columns at once? >>> >>> >>> -- >>> Dimitri Liakhovitski >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> > > > > -- > Dimitri Liakhovitski-- Dimitri Liakhovitski
Clint Bowman
2015-Jun-16 18:06 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
May want to add headers but the following provides the device number with each set fo sums: for (dev in (unique(md$device))) {cat(colSums(subset(md,md$device==dev)==5,na.rm=T),dev,"\n")} Clint Bowman INTERNET: clint at ecy.wa.gov Air Quality Modeler INTERNET: clint at math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 USPS: PO Box 47600, Olympia, WA 98504-7600 Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote:> Except, of course, Bert, that you forgot that it had to be done by > device. Your solution ignores the device. > > md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5), > device = c(1,1,2,2,3,3)) > myvars = c("a", "b", "c") > md[2,3] <- NA > md[4,1] <- NA > md > vapply(md[myvars], function(x) sum(x==5,na.rm=TRUE),1L) > > But the result should be by device. > > On Tue, Jun 16, 2015 at 1:56 PM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> Thank you, Bert. >> I'll be honest - I am just learning dplyr and was wondering if one >> could do it in dplyr. >> But of course your solution is perfect... >> >> On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >>> Well, dplyr seems a bit of overkill as it's so simple with plain old >>> vapply() in base R : >>> >>> >>>> dat <- data.frame (a=sample(1:5,10,rep=TRUE), >>> + b=sample(3:7,10,rep=TRUE), >>> + g = sample(7:9,10,rep=TRUE)) >>> >>>> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L) >>> >>> a b g >>> 5 4 0 >>> >>> >>> >>> Cheers, >>> Bert >>> >>> Bert Gunter >>> >>> "Data is not information. Information is not knowledge. And knowledge is >>> certainly not wisdom." >>> -- Clifford Stoll >>> >>> On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski >>> <dimitri.liakhovitski at gmail.com> wrote: >>>> >>>> Hello! >>>> >>>> I have a data frame: >>>> >>>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c >>>> c(1,3,4,3,5,5), >>>> device = c(1,1,2,2,3,3)) >>>> myvars = c("a", "b", "c") >>>> md[2,3] <- NA >>>> md[4,1] <- NA >>>> md >>>> >>>> I want to count number of 5s in each column - by device. I can do it like >>>> this: >>>> >>>> library(dplyr) >>>> group_by(md, device) %>% >>>> summarise(counts.a = sum(a==5, na.rm = T), >>>> counts.b = sum(b==5, na.rm = T), >>>> counts.c = sum(c==5, na.rm = T)) >>>> >>>> However, in real life I'll have tons of variables (the length of >>>> 'myvars' can be very large) - so that I can't specify those counts.a, >>>> counts.b, etc. manually - dozens of times. >>>> >>>> Does dplyr allow to run the count of 5s on all 'myvars' columns at once? >>>> >>>> >>>> -- >>>> Dimitri Liakhovitski >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> >> >> -- >> Dimitri Liakhovitski > > > > -- > Dimitri Liakhovitski > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >