Clint Bowman
2015-Jun-16 18:06 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
May want to add headers but the following provides the device number with each set fo sums: for (dev in (unique(md$device))) {cat(colSums(subset(md,md$device==dev)==5,na.rm=T),dev,"\n")} Clint Bowman INTERNET: clint at ecy.wa.gov Air Quality Modeler INTERNET: clint at math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 USPS: PO Box 47600, Olympia, WA 98504-7600 Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote:> Except, of course, Bert, that you forgot that it had to be done by > device. Your solution ignores the device. > > md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5), > device = c(1,1,2,2,3,3)) > myvars = c("a", "b", "c") > md[2,3] <- NA > md[4,1] <- NA > md > vapply(md[myvars], function(x) sum(x==5,na.rm=TRUE),1L) > > But the result should be by device. > > On Tue, Jun 16, 2015 at 1:56 PM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> Thank you, Bert. >> I'll be honest - I am just learning dplyr and was wondering if one >> could do it in dplyr. >> But of course your solution is perfect... >> >> On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote: >>> Well, dplyr seems a bit of overkill as it's so simple with plain old >>> vapply() in base R : >>> >>> >>>> dat <- data.frame (a=sample(1:5,10,rep=TRUE), >>> + b=sample(3:7,10,rep=TRUE), >>> + g = sample(7:9,10,rep=TRUE)) >>> >>>> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L) >>> >>> a b g >>> 5 4 0 >>> >>> >>> >>> Cheers, >>> Bert >>> >>> Bert Gunter >>> >>> "Data is not information. Information is not knowledge. And knowledge is >>> certainly not wisdom." >>> -- Clifford Stoll >>> >>> On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski >>> <dimitri.liakhovitski at gmail.com> wrote: >>>> >>>> Hello! >>>> >>>> I have a data frame: >>>> >>>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c >>>> c(1,3,4,3,5,5), >>>> device = c(1,1,2,2,3,3)) >>>> myvars = c("a", "b", "c") >>>> md[2,3] <- NA >>>> md[4,1] <- NA >>>> md >>>> >>>> I want to count number of 5s in each column - by device. I can do it like >>>> this: >>>> >>>> library(dplyr) >>>> group_by(md, device) %>% >>>> summarise(counts.a = sum(a==5, na.rm = T), >>>> counts.b = sum(b==5, na.rm = T), >>>> counts.c = sum(c==5, na.rm = T)) >>>> >>>> However, in real life I'll have tons of variables (the length of >>>> 'myvars' can be very large) - so that I can't specify those counts.a, >>>> counts.b, etc. manually - dozens of times. >>>> >>>> Does dplyr allow to run the count of 5s on all 'myvars' columns at once? >>>> >>>> >>>> -- >>>> Dimitri Liakhovitski >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >> >> >> >> -- >> Dimitri Liakhovitski > > > > -- > Dimitri Liakhovitski > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dimitri Liakhovitski
2015-Jun-16 18:11 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
Thank you, Clint. That's the thing: it's relatively easy to do it in base, but the resulting code is not THAT simple. I thought dplyr would make it easy... On Tue, Jun 16, 2015 at 2:06 PM, Clint Bowman <clint at ecy.wa.gov> wrote:> May want to add headers but the following provides the device number with > each set fo sums: > > for (dev in (unique(md$device))) > {cat(colSums(subset(md,md$device==dev)==5,na.rm=T),dev,"\n")} > > Clint Bowman INTERNET: clint at ecy.wa.gov > Air Quality Modeler INTERNET: clint at math.utah.edu > Department of Ecology VOICE: (360) 407-6815 > PO Box 47600 FAX: (360) 407-7534 > Olympia, WA 98504-7600 > > USPS: PO Box 47600, Olympia, WA 98504-7600 > Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 > > On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote: > >> Except, of course, Bert, that you forgot that it had to be done by >> device. Your solution ignores the device. >> >> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c >> c(1,3,4,3,5,5), >> device = c(1,1,2,2,3,3)) >> myvars = c("a", "b", "c") >> md[2,3] <- NA >> md[4,1] <- NA >> md >> vapply(md[myvars], function(x) sum(x==5,na.rm=TRUE),1L) >> >> But the result should be by device. >> >> On Tue, Jun 16, 2015 at 1:56 PM, Dimitri Liakhovitski >> <dimitri.liakhovitski at gmail.com> wrote: >>> >>> Thank you, Bert. >>> I'll be honest - I am just learning dplyr and was wondering if one >>> could do it in dplyr. >>> But of course your solution is perfect... >>> >>> On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4567 at gmail.com> >>> wrote: >>>> >>>> Well, dplyr seems a bit of overkill as it's so simple with plain old >>>> vapply() in base R : >>>> >>>> >>>>> dat <- data.frame (a=sample(1:5,10,rep=TRUE), >>>> >>>> + b=sample(3:7,10,rep=TRUE), >>>> + g = sample(7:9,10,rep=TRUE)) >>>> >>>>> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L) >>>> >>>> >>>> a b g >>>> 5 4 0 >>>> >>>> >>>> >>>> Cheers, >>>> Bert >>>> >>>> Bert Gunter >>>> >>>> "Data is not information. Information is not knowledge. And knowledge is >>>> certainly not wisdom." >>>> -- Clifford Stoll >>>> >>>> On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski >>>> <dimitri.liakhovitski at gmail.com> wrote: >>>>> >>>>> >>>>> Hello! >>>>> >>>>> I have a data frame: >>>>> >>>>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c >>>>> c(1,3,4,3,5,5), >>>>> device = c(1,1,2,2,3,3)) >>>>> myvars = c("a", "b", "c") >>>>> md[2,3] <- NA >>>>> md[4,1] <- NA >>>>> md >>>>> >>>>> I want to count number of 5s in each column - by device. I can do it >>>>> like >>>>> this: >>>>> >>>>> library(dplyr) >>>>> group_by(md, device) %>% >>>>> summarise(counts.a = sum(a==5, na.rm = T), >>>>> counts.b = sum(b==5, na.rm = T), >>>>> counts.c = sum(c==5, na.rm = T)) >>>>> >>>>> However, in real life I'll have tons of variables (the length of >>>>> 'myvars' can be very large) - so that I can't specify those counts.a, >>>>> counts.b, etc. manually - dozens of times. >>>>> >>>>> Does dplyr allow to run the count of 5s on all 'myvars' columns at >>>>> once? >>>>> >>>>> >>>>> -- >>>>> Dimitri Liakhovitski >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Dimitri Liakhovitski >> >> >> >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >-- Dimitri Liakhovitski
Clint Bowman
2015-Jun-16 18:18 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
Thanks, Dimitri. Burt is the real wizard here--I'll bet he can conjure up an elegant solution. For me, just reaching a desired endpoint is enough<g>. Clint Clint Bowman INTERNET: clint at ecy.wa.gov Air Quality Modeler INTERNET: clint at math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 USPS: PO Box 47600, Olympia, WA 98504-7600 Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote:> Thank you, Clint. > That's the thing: it's relatively easy to do it in base, but the > resulting code is not THAT simple. > I thought dplyr would make it easy... > > On Tue, Jun 16, 2015 at 2:06 PM, Clint Bowman <clint at ecy.wa.gov> wrote: >> May want to add headers but the following provides the device number with >> each set fo sums: >> >> for (dev in (unique(md$device))) >> {cat(colSums(subset(md,md$device==dev)==5,na.rm=T),dev,"\n")} >> >> Clint Bowman INTERNET: clint at ecy.wa.gov >> Air Quality Modeler INTERNET: clint at math.utah.edu >> Department of Ecology VOICE: (360) 407-6815 >> PO Box 47600 FAX: (360) 407-7534 >> Olympia, WA 98504-7600 >> >> USPS: PO Box 47600, Olympia, WA 98504-7600 >> Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 >> >> On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote: >> >>> Except, of course, Bert, that you forgot that it had to be done by >>> device. Your solution ignores the device. >>> >>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c >>> c(1,3,4,3,5,5), >>> device = c(1,1,2,2,3,3)) >>> myvars = c("a", "b", "c") >>> md[2,3] <- NA >>> md[4,1] <- NA >>> md >>> vapply(md[myvars], function(x) sum(x==5,na.rm=TRUE),1L) >>> >>> But the result should be by device. >>> >>> On Tue, Jun 16, 2015 at 1:56 PM, Dimitri Liakhovitski >>> <dimitri.liakhovitski at gmail.com> wrote: >>>> >>>> Thank you, Bert. >>>> I'll be honest - I am just learning dplyr and was wondering if one >>>> could do it in dplyr. >>>> But of course your solution is perfect... >>>> >>>> On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4567 at gmail.com> >>>> wrote: >>>>> >>>>> Well, dplyr seems a bit of overkill as it's so simple with plain old >>>>> vapply() in base R : >>>>> >>>>> >>>>>> dat <- data.frame (a=sample(1:5,10,rep=TRUE), >>>>> >>>>> + b=sample(3:7,10,rep=TRUE), >>>>> + g = sample(7:9,10,rep=TRUE)) >>>>> >>>>>> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L) >>>>> >>>>> >>>>> a b g >>>>> 5 4 0 >>>>> >>>>> >>>>> >>>>> Cheers, >>>>> Bert >>>>> >>>>> Bert Gunter >>>>> >>>>> "Data is not information. Information is not knowledge. And knowledge is >>>>> certainly not wisdom." >>>>> -- Clifford Stoll >>>>> >>>>> On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski >>>>> <dimitri.liakhovitski at gmail.com> wrote: >>>>>> >>>>>> >>>>>> Hello! >>>>>> >>>>>> I have a data frame: >>>>>> >>>>>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c >>>>>> c(1,3,4,3,5,5), >>>>>> device = c(1,1,2,2,3,3)) >>>>>> myvars = c("a", "b", "c") >>>>>> md[2,3] <- NA >>>>>> md[4,1] <- NA >>>>>> md >>>>>> >>>>>> I want to count number of 5s in each column - by device. I can do it >>>>>> like >>>>>> this: >>>>>> >>>>>> library(dplyr) >>>>>> group_by(md, device) %>% >>>>>> summarise(counts.a = sum(a==5, na.rm = T), >>>>>> counts.b = sum(b==5, na.rm = T), >>>>>> counts.c = sum(c==5, na.rm = T)) >>>>>> >>>>>> However, in real life I'll have tons of variables (the length of >>>>>> 'myvars' can be very large) - so that I can't specify those counts.a, >>>>>> counts.b, etc. manually - dozens of times. >>>>>> >>>>>> Does dplyr allow to run the count of 5s on all 'myvars' columns at >>>>>> once? >>>>>> >>>>>> >>>>>> -- >>>>>> Dimitri Liakhovitski >>>>>> >>>>>> ______________________________________________ >>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>>> PLEASE do read the posting guide >>>>>> http://www.R-project.org/posting-guide.html >>>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>> >>>>> >>>> >>>> >>>> >>>> -- >>>> Dimitri Liakhovitski >>> >>> >>> >>> >>> -- >>> Dimitri Liakhovitski >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > > > > -- > Dimitri Liakhovitski >