Clint Bowman
2015-Jun-16 18:06 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
May want to add headers but the following provides the device number with
each set fo sums:
for (dev in (unique(md$device)))
{cat(colSums(subset(md,md$device==dev)==5,na.rm=T),dev,"\n")}
Clint Bowman INTERNET: clint at ecy.wa.gov
Air Quality Modeler INTERNET: clint at math.utah.edu
Department of Ecology VOICE: (360) 407-6815
PO Box 47600 FAX: (360) 407-7534
Olympia, WA 98504-7600
USPS: PO Box 47600, Olympia, WA 98504-7600
Parcels: 300 Desmond Drive, Lacey, WA 98503-1274
On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote:
> Except, of course, Bert, that you forgot that it had to be done by
> device. Your solution ignores the device.
>
> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c =
c(1,3,4,3,5,5),
> device = c(1,1,2,2,3,3))
> myvars = c("a", "b", "c")
> md[2,3] <- NA
> md[4,1] <- NA
> md
> vapply(md[myvars], function(x) sum(x==5,na.rm=TRUE),1L)
>
> But the result should be by device.
>
> On Tue, Jun 16, 2015 at 1:56 PM, Dimitri Liakhovitski
> <dimitri.liakhovitski at gmail.com> wrote:
>> Thank you, Bert.
>> I'll be honest - I am just learning dplyr and was wondering if one
>> could do it in dplyr.
>> But of course your solution is perfect...
>>
>> On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4567 at
gmail.com> wrote:
>>> Well, dplyr seems a bit of overkill as it's so simple with
plain old
>>> vapply() in base R :
>>>
>>>
>>>> dat <- data.frame (a=sample(1:5,10,rep=TRUE),
>>> + b=sample(3:7,10,rep=TRUE),
>>> + g = sample(7:9,10,rep=TRUE))
>>>
>>>> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L)
>>>
>>> a b g
>>> 5 4 0
>>>
>>>
>>>
>>> Cheers,
>>> Bert
>>>
>>> Bert Gunter
>>>
>>> "Data is not information. Information is not knowledge. And
knowledge is
>>> certainly not wisdom."
>>> -- Clifford Stoll
>>>
>>> On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski
>>> <dimitri.liakhovitski at gmail.com> wrote:
>>>>
>>>> Hello!
>>>>
>>>> I have a data frame:
>>>>
>>>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c
>>>> c(1,3,4,3,5,5),
>>>> device = c(1,1,2,2,3,3))
>>>> myvars = c("a", "b", "c")
>>>> md[2,3] <- NA
>>>> md[4,1] <- NA
>>>> md
>>>>
>>>> I want to count number of 5s in each column - by device. I can
do it like
>>>> this:
>>>>
>>>> library(dplyr)
>>>> group_by(md, device) %>%
>>>> summarise(counts.a = sum(a==5, na.rm = T),
>>>> counts.b = sum(b==5, na.rm = T),
>>>> counts.c = sum(c==5, na.rm = T))
>>>>
>>>> However, in real life I'll have tons of variables (the
length of
>>>> 'myvars' can be very large) - so that I can't
specify those counts.a,
>>>> counts.b, etc. manually - dozens of times.
>>>>
>>>> Does dplyr allow to run the count of 5s on all 'myvars'
columns at once?
>>>>
>>>>
>>>> --
>>>> Dimitri Liakhovitski
>>>>
>>>> ______________________________________________
>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and
more, see
>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>> PLEASE do read the posting guide
>>>> http://www.R-project.org/posting-guide.html
>>>> and provide commented, minimal, self-contained, reproducible
code.
>>>
>>>
>>
>>
>>
>> --
>> Dimitri Liakhovitski
>
>
>
> --
> Dimitri Liakhovitski
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
Dimitri Liakhovitski
2015-Jun-16 18:11 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
Thank you, Clint. That's the thing: it's relatively easy to do it in base, but the resulting code is not THAT simple. I thought dplyr would make it easy... On Tue, Jun 16, 2015 at 2:06 PM, Clint Bowman <clint at ecy.wa.gov> wrote:> May want to add headers but the following provides the device number with > each set fo sums: > > for (dev in (unique(md$device))) > {cat(colSums(subset(md,md$device==dev)==5,na.rm=T),dev,"\n")} > > Clint Bowman INTERNET: clint at ecy.wa.gov > Air Quality Modeler INTERNET: clint at math.utah.edu > Department of Ecology VOICE: (360) 407-6815 > PO Box 47600 FAX: (360) 407-7534 > Olympia, WA 98504-7600 > > USPS: PO Box 47600, Olympia, WA 98504-7600 > Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 > > On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote: > >> Except, of course, Bert, that you forgot that it had to be done by >> device. Your solution ignores the device. >> >> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c >> c(1,3,4,3,5,5), >> device = c(1,1,2,2,3,3)) >> myvars = c("a", "b", "c") >> md[2,3] <- NA >> md[4,1] <- NA >> md >> vapply(md[myvars], function(x) sum(x==5,na.rm=TRUE),1L) >> >> But the result should be by device. >> >> On Tue, Jun 16, 2015 at 1:56 PM, Dimitri Liakhovitski >> <dimitri.liakhovitski at gmail.com> wrote: >>> >>> Thank you, Bert. >>> I'll be honest - I am just learning dplyr and was wondering if one >>> could do it in dplyr. >>> But of course your solution is perfect... >>> >>> On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4567 at gmail.com> >>> wrote: >>>> >>>> Well, dplyr seems a bit of overkill as it's so simple with plain old >>>> vapply() in base R : >>>> >>>> >>>>> dat <- data.frame (a=sample(1:5,10,rep=TRUE), >>>> >>>> + b=sample(3:7,10,rep=TRUE), >>>> + g = sample(7:9,10,rep=TRUE)) >>>> >>>>> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L) >>>> >>>> >>>> a b g >>>> 5 4 0 >>>> >>>> >>>> >>>> Cheers, >>>> Bert >>>> >>>> Bert Gunter >>>> >>>> "Data is not information. Information is not knowledge. And knowledge is >>>> certainly not wisdom." >>>> -- Clifford Stoll >>>> >>>> On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski >>>> <dimitri.liakhovitski at gmail.com> wrote: >>>>> >>>>> >>>>> Hello! >>>>> >>>>> I have a data frame: >>>>> >>>>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c >>>>> c(1,3,4,3,5,5), >>>>> device = c(1,1,2,2,3,3)) >>>>> myvars = c("a", "b", "c") >>>>> md[2,3] <- NA >>>>> md[4,1] <- NA >>>>> md >>>>> >>>>> I want to count number of 5s in each column - by device. I can do it >>>>> like >>>>> this: >>>>> >>>>> library(dplyr) >>>>> group_by(md, device) %>% >>>>> summarise(counts.a = sum(a==5, na.rm = T), >>>>> counts.b = sum(b==5, na.rm = T), >>>>> counts.c = sum(c==5, na.rm = T)) >>>>> >>>>> However, in real life I'll have tons of variables (the length of >>>>> 'myvars' can be very large) - so that I can't specify those counts.a, >>>>> counts.b, etc. manually - dozens of times. >>>>> >>>>> Does dplyr allow to run the count of 5s on all 'myvars' columns at >>>>> once? >>>>> >>>>> >>>>> -- >>>>> Dimitri Liakhovitski >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>> >>>> >>> >>> >>> >>> -- >>> Dimitri Liakhovitski >> >> >> >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >-- Dimitri Liakhovitski
Clint Bowman
2015-Jun-16 18:18 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
Thanks, Dimitri. Burt is the real wizard here--I'll bet he can conjure up
an elegant solution.
For me, just reaching a desired endpoint is enough<g>.
Clint
Clint Bowman INTERNET: clint at ecy.wa.gov
Air Quality Modeler INTERNET: clint at math.utah.edu
Department of Ecology VOICE: (360) 407-6815
PO Box 47600 FAX: (360) 407-7534
Olympia, WA 98504-7600
USPS: PO Box 47600, Olympia, WA 98504-7600
Parcels: 300 Desmond Drive, Lacey, WA 98503-1274
On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote:
> Thank you, Clint.
> That's the thing: it's relatively easy to do it in base, but the
> resulting code is not THAT simple.
> I thought dplyr would make it easy...
>
> On Tue, Jun 16, 2015 at 2:06 PM, Clint Bowman <clint at ecy.wa.gov>
wrote:
>> May want to add headers but the following provides the device number
with
>> each set fo sums:
>>
>> for (dev in (unique(md$device)))
>> {cat(colSums(subset(md,md$device==dev)==5,na.rm=T),dev,"\n")}
>>
>> Clint Bowman INTERNET: clint at ecy.wa.gov
>> Air Quality Modeler INTERNET: clint at math.utah.edu
>> Department of Ecology VOICE: (360) 407-6815
>> PO Box 47600 FAX: (360) 407-7534
>> Olympia, WA 98504-7600
>>
>> USPS: PO Box 47600, Olympia, WA 98504-7600
>> Parcels: 300 Desmond Drive, Lacey, WA 98503-1274
>>
>> On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote:
>>
>>> Except, of course, Bert, that you forgot that it had to be done by
>>> device. Your solution ignores the device.
>>>
>>> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c
>>> c(1,3,4,3,5,5),
>>> device = c(1,1,2,2,3,3))
>>> myvars = c("a", "b", "c")
>>> md[2,3] <- NA
>>> md[4,1] <- NA
>>> md
>>> vapply(md[myvars], function(x) sum(x==5,na.rm=TRUE),1L)
>>>
>>> But the result should be by device.
>>>
>>> On Tue, Jun 16, 2015 at 1:56 PM, Dimitri Liakhovitski
>>> <dimitri.liakhovitski at gmail.com> wrote:
>>>>
>>>> Thank you, Bert.
>>>> I'll be honest - I am just learning dplyr and was wondering
if one
>>>> could do it in dplyr.
>>>> But of course your solution is perfect...
>>>>
>>>> On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4567
at gmail.com>
>>>> wrote:
>>>>>
>>>>> Well, dplyr seems a bit of overkill as it's so simple
with plain old
>>>>> vapply() in base R :
>>>>>
>>>>>
>>>>>> dat <- data.frame (a=sample(1:5,10,rep=TRUE),
>>>>>
>>>>> + b=sample(3:7,10,rep=TRUE),
>>>>> + g = sample(7:9,10,rep=TRUE))
>>>>>
>>>>>> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L)
>>>>>
>>>>>
>>>>> a b g
>>>>> 5 4 0
>>>>>
>>>>>
>>>>>
>>>>> Cheers,
>>>>> Bert
>>>>>
>>>>> Bert Gunter
>>>>>
>>>>> "Data is not information. Information is not
knowledge. And knowledge is
>>>>> certainly not wisdom."
>>>>> -- Clifford Stoll
>>>>>
>>>>> On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski
>>>>> <dimitri.liakhovitski at gmail.com> wrote:
>>>>>>
>>>>>>
>>>>>> Hello!
>>>>>>
>>>>>> I have a data frame:
>>>>>>
>>>>>> md <- data.frame(a = c(3,5,4,5,3,5), b =
c(5,5,5,4,4,1), c >>>>>> c(1,3,4,3,5,5),
>>>>>> device = c(1,1,2,2,3,3))
>>>>>> myvars = c("a", "b", "c")
>>>>>> md[2,3] <- NA
>>>>>> md[4,1] <- NA
>>>>>> md
>>>>>>
>>>>>> I want to count number of 5s in each column - by
device. I can do it
>>>>>> like
>>>>>> this:
>>>>>>
>>>>>> library(dplyr)
>>>>>> group_by(md, device) %>%
>>>>>> summarise(counts.a = sum(a==5, na.rm = T),
>>>>>> counts.b = sum(b==5, na.rm = T),
>>>>>> counts.c = sum(c==5, na.rm = T))
>>>>>>
>>>>>> However, in real life I'll have tons of variables
(the length of
>>>>>> 'myvars' can be very large) - so that I
can't specify those counts.a,
>>>>>> counts.b, etc. manually - dozens of times.
>>>>>>
>>>>>> Does dplyr allow to run the count of 5s on all
'myvars' columns at
>>>>>> once?
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Dimitri Liakhovitski
>>>>>>
>>>>>> ______________________________________________
>>>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE
and more, see
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>>>>> PLEASE do read the posting guide
>>>>>> http://www.R-project.org/posting-guide.html
>>>>>> and provide commented, minimal, self-contained,
reproducible code.
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Dimitri Liakhovitski
>>>
>>>
>>>
>>>
>>> --
>>> Dimitri Liakhovitski
>>>
>>> ______________________________________________
>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more,
see
>>> https://stat.ethz.ch/mailman/listinfo/r-help
>>> PLEASE do read the posting guide
>>> http://www.R-project.org/posting-guide.html
>>> and provide commented, minimal, self-contained, reproducible code.
>>>
>>
>
>
>
> --
> Dimitri Liakhovitski
>