Dimitri Liakhovitski
2015-Jun-16 17:24 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
Hello! I have a data frame: md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5), device = c(1,1,2,2,3,3)) myvars = c("a", "b", "c") md[2,3] <- NA md[4,1] <- NA md I want to count number of 5s in each column - by device. I can do it like this: library(dplyr) group_by(md, device) %>% summarise(counts.a = sum(a==5, na.rm = T), counts.b = sum(b==5, na.rm = T), counts.c = sum(c==5, na.rm = T)) However, in real life I'll have tons of variables (the length of 'myvars' can be very large) - so that I can't specify those counts.a, counts.b, etc. manually - dozens of times. Does dplyr allow to run the count of 5s on all 'myvars' columns at once? -- Dimitri Liakhovitski
Clint Bowman
2015-Jun-16 17:40 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
Any problem with colSums(md==5, na.rm=T) Clint Bowman INTERNET: clint at ecy.wa.gov Air Quality Modeler INTERNET: clint at math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 USPS: PO Box 47600, Olympia, WA 98504-7600 Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote:> Hello! > > I have a data frame: > > md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5), > device = c(1,1,2,2,3,3)) > myvars = c("a", "b", "c") > md[2,3] <- NA > md[4,1] <- NA > md > > I want to count number of 5s in each column - by device. I can do it like this: > > library(dplyr) > group_by(md, device) %>% > summarise(counts.a = sum(a==5, na.rm = T), > counts.b = sum(b==5, na.rm = T), > counts.c = sum(c==5, na.rm = T)) > > However, in real life I'll have tons of variables (the length of > 'myvars' can be very large) - so that I can't specify those counts.a, > counts.b, etc. manually - dozens of times. > > Does dplyr allow to run the count of 5s on all 'myvars' columns at once? > > > -- > Dimitri Liakhovitski > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Dimitri Liakhovitski
2015-Jun-16 17:42 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
No problem at all, Clint. I was just trying to figure out of dplyr can do it. On Tue, Jun 16, 2015 at 1:40 PM, Clint Bowman <clint at ecy.wa.gov> wrote:> Any problem with > > colSums(md==5, na.rm=T) > > Clint Bowman INTERNET: clint at ecy.wa.gov > Air Quality Modeler INTERNET: clint at math.utah.edu > Department of Ecology VOICE: (360) 407-6815 > PO Box 47600 FAX: (360) 407-7534 > Olympia, WA 98504-7600 > > USPS: PO Box 47600, Olympia, WA 98504-7600 > Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 > > > On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote: > >> Hello! >> >> I have a data frame: >> >> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c >> c(1,3,4,3,5,5), >> device = c(1,1,2,2,3,3)) >> myvars = c("a", "b", "c") >> md[2,3] <- NA >> md[4,1] <- NA >> md >> >> I want to count number of 5s in each column - by device. I can do it like >> this: >> >> library(dplyr) >> group_by(md, device) %>% >> summarise(counts.a = sum(a==5, na.rm = T), >> counts.b = sum(b==5, na.rm = T), >> counts.c = sum(c==5, na.rm = T)) >> >> However, in real life I'll have tons of variables (the length of >> 'myvars' can be very large) - so that I can't specify those counts.a, >> counts.b, etc. manually - dozens of times. >> >> Does dplyr allow to run the count of 5s on all 'myvars' columns at once? >> >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >-- Dimitri Liakhovitski
Clint Bowman
2015-Jun-16 17:48 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
It would help if I could see beyond my allergy meds. A start could be: colSums(subset(md,md$device==1)==5,na.rm=T) colSums(subset(md,md$device==2)==5,na.rm=T) colSums(subset(md,md$device==3)==5,na.rm=T) Clint Bowman INTERNET: clint at ecy.wa.gov Air Quality Modeler INTERNET: clint at math.utah.edu Department of Ecology VOICE: (360) 407-6815 PO Box 47600 FAX: (360) 407-7534 Olympia, WA 98504-7600 USPS: PO Box 47600, Olympia, WA 98504-7600 Parcels: 300 Desmond Drive, Lacey, WA 98503-1274 On Tue, 16 Jun 2015, Dimitri Liakhovitski wrote:> Hello! > > I have a data frame: > > md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5), > device = c(1,1,2,2,3,3)) > myvars = c("a", "b", "c") > md[2,3] <- NA > md[4,1] <- NA > md > > I want to count number of 5s in each column - by device. I can do it like this: > > library(dplyr) > group_by(md, device) %>% > summarise(counts.a = sum(a==5, na.rm = T), > counts.b = sum(b==5, na.rm = T), > counts.c = sum(c==5, na.rm = T)) > > However, in real life I'll have tons of variables (the length of > 'myvars' can be very large) - so that I can't specify those counts.a, > counts.b, etc. manually - dozens of times. > > Does dplyr allow to run the count of 5s on all 'myvars' columns at once? > > > -- > Dimitri Liakhovitski > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Bert Gunter
2015-Jun-16 17:50 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
Well, dplyr seems a bit of overkill as it's so simple with plain old vapply() in base R :> dat <- data.frame (a=sample(1:5,10,rep=TRUE),+ b=sample(3:7,10,rep=TRUE), + g = sample(7:9,10,rep=TRUE))> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L)a b g 5 4 0 Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski < dimitri.liakhovitski at gmail.com> wrote:> Hello! > > I have a data frame: > > md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c > c(1,3,4,3,5,5), > device = c(1,1,2,2,3,3)) > myvars = c("a", "b", "c") > md[2,3] <- NA > md[4,1] <- NA > md > > I want to count number of 5s in each column - by device. I can do it like > this: > > library(dplyr) > group_by(md, device) %>% > summarise(counts.a = sum(a==5, na.rm = T), > counts.b = sum(b==5, na.rm = T), > counts.c = sum(c==5, na.rm = T)) > > However, in real life I'll have tons of variables (the length of > 'myvars' can be very large) - so that I can't specify those counts.a, > counts.b, etc. manually - dozens of times. > > Does dplyr allow to run the count of 5s on all 'myvars' columns at once? > > > -- > Dimitri Liakhovitski > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Dimitri Liakhovitski
2015-Jun-16 17:56 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
Thank you, Bert. I'll be honest - I am just learning dplyr and was wondering if one could do it in dplyr. But of course your solution is perfect... On Tue, Jun 16, 2015 at 1:50 PM, Bert Gunter <bgunter.4567 at gmail.com> wrote:> Well, dplyr seems a bit of overkill as it's so simple with plain old > vapply() in base R : > > >> dat <- data.frame (a=sample(1:5,10,rep=TRUE), > + b=sample(3:7,10,rep=TRUE), > + g = sample(7:9,10,rep=TRUE)) > >> vapply(dat,function(x)sum(x==5,na.rm=TRUE),1L) > > a b g > 5 4 0 > > > > Cheers, > Bert > > Bert Gunter > > "Data is not information. Information is not knowledge. And knowledge is > certainly not wisdom." > -- Clifford Stoll > > On Tue, Jun 16, 2015 at 10:24 AM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> >> Hello! >> >> I have a data frame: >> >> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c >> c(1,3,4,3,5,5), >> device = c(1,1,2,2,3,3)) >> myvars = c("a", "b", "c") >> md[2,3] <- NA >> md[4,1] <- NA >> md >> >> I want to count number of 5s in each column - by device. I can do it like >> this: >> >> library(dplyr) >> group_by(md, device) %>% >> summarise(counts.a = sum(a==5, na.rm = T), >> counts.b = sum(b==5, na.rm = T), >> counts.c = sum(c==5, na.rm = T)) >> >> However, in real life I'll have tons of variables (the length of >> 'myvars' can be very large) - so that I can't specify those counts.a, >> counts.b, etc. manually - dozens of times. >> >> Does dplyr allow to run the count of 5s on all 'myvars' columns at once? >> >> >> -- >> Dimitri Liakhovitski >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >-- Dimitri Liakhovitski
Hadley Wickham
2015-Jun-16 19:47 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
On Tue, Jun 16, 2015 at 12:24 PM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> Hello! > > I have a data frame: > > md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5), > device = c(1,1,2,2,3,3)) > myvars = c("a", "b", "c") > md[2,3] <- NA > md[4,1] <- NA > md > > I want to count number of 5s in each column - by device. I can do it like this: > > library(dplyr) > group_by(md, device) %>% > summarise(counts.a = sum(a==5, na.rm = T), > counts.b = sum(b==5, na.rm = T), > counts.c = sum(c==5, na.rm = T)) > > However, in real life I'll have tons of variables (the length of > 'myvars' can be very large) - so that I can't specify those counts.a, > counts.b, etc. manually - dozens of times. > > Does dplyr allow to run the count of 5s on all 'myvars' columns at once?md %>% group_by(device) %>% summarise_each(funs(sum(. == 5, na.rm = TRUE))) Hadley -- http://had.co.nz/
Dimitri Liakhovitski
2015-Jun-16 19:52 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
Thank you guys - it's a great learning: 'summarise_each' and 'funs' On Tue, Jun 16, 2015 at 3:47 PM, Hadley Wickham <h.wickham at gmail.com> wrote:> On Tue, Jun 16, 2015 at 12:24 PM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: >> Hello! >> >> I have a data frame: >> >> md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c = c(1,3,4,3,5,5), >> device = c(1,1,2,2,3,3)) >> myvars = c("a", "b", "c") >> md[2,3] <- NA >> md[4,1] <- NA >> md >> >> I want to count number of 5s in each column - by device. I can do it like this: >> >> library(dplyr) >> group_by(md, device) %>% >> summarise(counts.a = sum(a==5, na.rm = T), >> counts.b = sum(b==5, na.rm = T), >> counts.c = sum(c==5, na.rm = T)) >> >> However, in real life I'll have tons of variables (the length of >> 'myvars' can be very large) - so that I can't specify those counts.a, >> counts.b, etc. manually - dozens of times. >> >> Does dplyr allow to run the count of 5s on all 'myvars' columns at once? > > md %>% > group_by(device) %>% > summarise_each(funs(sum(. == 5, na.rm = TRUE))) > > Hadley > > -- > http://had.co.nz/-- Dimitri Liakhovitski
Bert Gunter
2015-Jun-16 20:02 UTC
[R] dplyr - counting a number of specific values in each column - for all columns at once
... my bad! -- I filed to read carefully. A base syntax version is: dat <- data.frame (a=sample(1:5,10,rep=TRUE), b=sample(3:7,10,rep=TRUE), g = sample(7:9,10,rep=TRUE)) dev <- sample(1:3,10,rep=TRUE) sapply(dat,function(x) tapply(x,dev,function(x)sum(x==5,na.rm=TRUE))) a b g 1 2 0 0 2 1 3 0 3 2 1 0 I think, no matter what, that there are 2 loops here: An outer one by column and an inner one by device within each column. Being both old and lazy, I have found it easier and more natural to stick with the basic functional syntax of the "apply" family of functions rather than to learn an alternative database type syntax (and semantics). My applications were never so large that the possible execution inefficiency mattered. However, it certainly might for others. And of course, what is "natural" for me might not be for others. Cheers, Bert Bert Gunter "Data is not information. Information is not knowledge. And knowledge is certainly not wisdom." -- Clifford Stoll On Tue, Jun 16, 2015 at 12:47 PM, Hadley Wickham <h.wickham at gmail.com> wrote:> On Tue, Jun 16, 2015 at 12:24 PM, Dimitri Liakhovitski > <dimitri.liakhovitski at gmail.com> wrote: > > Hello! > > > > I have a data frame: > > > > md <- data.frame(a = c(3,5,4,5,3,5), b = c(5,5,5,4,4,1), c > c(1,3,4,3,5,5), > > device = c(1,1,2,2,3,3)) > > myvars = c("a", "b", "c") > > md[2,3] <- NA > > md[4,1] <- NA > > md > > > > I want to count number of 5s in each column - by device. I can do it > like this: > > > > library(dplyr) > > group_by(md, device) %>% > > summarise(counts.a = sum(a==5, na.rm = T), > > counts.b = sum(b==5, na.rm = T), > > counts.c = sum(c==5, na.rm = T)) > > > > However, in real life I'll have tons of variables (the length of > > 'myvars' can be very large) - so that I can't specify those counts.a, > > counts.b, etc. manually - dozens of times. > > > > Does dplyr allow to run the count of 5s on all 'myvars' columns at once? > > md %>% > group_by(device) %>% > summarise_each(funs(sum(. == 5, na.rm = TRUE))) > > Hadley > > -- > http://had.co.nz/ > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]