The dplyr::select function returns a special variety of data.frame called a tibble. The tibble has certain features designed to make it behave consistently when indexing is used. Specifically, the `[` operator always returns a tibble regardless of how many columns are indicated by the column index. This is unlike the conventional data frame which returns a vector when exactly one column is indicated by the column index, or a data.frame if more than one is indicated. A syntax that consistently yields a column vector with both tibbles and data.frames is dta[[ 1 ]] so ctab <- function(data) { CrossTable(data[[1]], data[[2]], prop.chisq = FALSE, prop.c = FALSE, prop.t = FALSE, format = "SPSS") } should work. On September 20, 2019 10:59:46 AM PDT, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:>On 20/09/2019 11:30 a.m., Zachary Lim wrote: >> Hi, >> >> I'm trying to create a simple function that takes a dataframe as its >only argument. I've been using gmodels::CrossTable, but it requires a >lot of arguments, e.g.: >> >> #this runs fine >> CrossTable(data$col1, data$col2, prop.chisq = FALSE, prop.c = FALSE, >prop.t = FALSE, format = "SPSS") >> >> Moreover, I wanted to make it compatible with piping, so I decided to >create the following function: >> >> ctab <- function(data) { >> CrossTable(data[,1], data[,2], prop.chisq = FALSE, prop.c = FALSE, >prop.t = FALSE, format = "SPSS") >> } >> >> When I try to use this function, however, I get the following error: >> >> #this results in 'Error: Must use a vector in `[`, not an object of >class matrix.' >> data %>% select(col1, col2) %>% ctab() >> >> I tried searching online but couldn't find much about that error >(except for in specific and unrelated cases). Moreover, when I created >a very simple dataset, it turns out there's no problem: >> >> #this runs fine >> data.frame(C1 = c('x','y','x','y'), C2 = c('a','a','b','b')) %>% >ctab() >> >> >> Is this a problem with my function or the data? If it's the data, why >does directly calling CrossTable work? > >Presumably data %>% select(col1, col2) isn't giving you a dataframe. >However, you haven't given us a reproducible example, so I can't tell >you what it's doing. But that's where you should look. > >Duncan Murdoch > >______________________________________________ >R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code.-- Sent from my phone. Please excuse my brevity.
On 21/09/2019 7:38 a.m., Jeff Newmiller wrote:> The dplyr::select function returns a special variety of data.frame called a tibble.I don't think that's always true. The docs say it returns "An object of the same class as .data.", and that's what I'm seeing: > str(data.frame(a=c(1,1,2,2), b=1:4) %>% subset(a == 1)) 'data.frame': 2 obs. of 2 variables: $ a: num 1 1 $ b: int 1 2 But I believe there are other dplyr functions that take dataframes as input and return tibbles, I just don't know which ones. Duncan Murdoch The tibble has certain features designed to make it behave consistently when indexing is used. Specifically, the `[` operator always returns a tibble regardless of how many columns are indicated by the column index. This is unlike the conventional data frame which returns a vector when exactly one column is indicated by the column index, or a data.frame if more than one is indicated.> > A syntax that consistently yields a column vector with both tibbles and data.frames is > > dta[[ 1 ]] > > so > > ctab <- function(data) { > CrossTable(data[[1]], data[[2]], prop.chisq = FALSE, prop.c = FALSE, > prop.t = FALSE, format = "SPSS") > } > > should work. > > On September 20, 2019 10:59:46 AM PDT, Duncan Murdoch <murdoch.duncan at gmail.com> wrote: >> On 20/09/2019 11:30 a.m., Zachary Lim wrote: >>> Hi, >>> >>> I'm trying to create a simple function that takes a dataframe as its >> only argument. I've been using gmodels::CrossTable, but it requires a >> lot of arguments, e.g.: >>> >>> #this runs fine >>> CrossTable(data$col1, data$col2, prop.chisq = FALSE, prop.c = FALSE, >> prop.t = FALSE, format = "SPSS") >>> >>> Moreover, I wanted to make it compatible with piping, so I decided to >> create the following function: >>> >>> ctab <- function(data) { >>> CrossTable(data[,1], data[,2], prop.chisq = FALSE, prop.c = FALSE, >> prop.t = FALSE, format = "SPSS") >>> } >>> >>> When I try to use this function, however, I get the following error: >>> >>> #this results in 'Error: Must use a vector in `[`, not an object of >> class matrix.' >>> data %>% select(col1, col2) %>% ctab() >>> >>> I tried searching online but couldn't find much about that error >> (except for in specific and unrelated cases). Moreover, when I created >> a very simple dataset, it turns out there's no problem: >>> >>> #this runs fine >>> data.frame(C1 = c('x','y','x','y'), C2 = c('a','a','b','b')) %>% >> ctab() >>> >>> >>> Is this a problem with my function or the data? If it's the data, why >> does directly calling CrossTable work? >> >> Presumably data %>% select(col1, col2) isn't giving you a dataframe. >> However, you haven't given us a reproducible example, so I can't tell >> you what it's doing. But that's where you should look. >> >> Duncan Murdoch >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >
Your use of subset instead of select does not help, but a corrected example does indeed confirm your point. library(dplyr) str(data.frame(a=c(1,1,2,2), b=1:4) %>% select(b,a)) ## 'data.frame': 4 obs. of 2 variables: ## $ b: int 1 2 3 4 ## $ a: num 1 1 2 2 However the `[` issue is still worth addressing. If that does not fix the problem then a dput(head(troublesomedata)) from Zachary will be needed to figure out what actually is going on. On September 21, 2019 5:22:07 AM PDT, Duncan Murdoch <murdoch.duncan at gmail.com> wrote:>On 21/09/2019 7:38 a.m., Jeff Newmiller wrote: >> The dplyr::select function returns a special variety of data.frame >called a tibble. > >I don't think that's always true. The docs say it returns "An object >of >the same class as .data.", and that's what I'm seeing: > > > str(data.frame(a=c(1,1,2,2), b=1:4) %>% subset(a == 1)) >'data.frame': 2 obs. of 2 variables: > $ a: num 1 1 > $ b: int 1 2 > >But I believe there are other dplyr functions that take dataframes as >input and return tibbles, I just don't know which ones. > >Duncan Murdoch > >The tibble has certain features designed to make it behave consistently > >when indexing is used. Specifically, the `[` operator always returns a >tibble regardless of how many columns are indicated by the column >index. >This is unlike the conventional data frame which returns a vector when >exactly one column is indicated by the column index, or a data.frame if > >more than one is indicated. >> >> A syntax that consistently yields a column vector with both tibbles >and data.frames is >> >> dta[[ 1 ]] >> >> so >> >> ctab <- function(data) { >> CrossTable(data[[1]], data[[2]], prop.chisq = FALSE, prop.c >FALSE, >> prop.t = FALSE, format = "SPSS") >> } >> >> should work. >> >> On September 20, 2019 10:59:46 AM PDT, Duncan Murdoch ><murdoch.duncan at gmail.com> wrote: >>> On 20/09/2019 11:30 a.m., Zachary Lim wrote: >>>> Hi, >>>> >>>> I'm trying to create a simple function that takes a dataframe as >its >>> only argument. I've been using gmodels::CrossTable, but it requires >a >>> lot of arguments, e.g.: >>>> >>>> #this runs fine >>>> CrossTable(data$col1, data$col2, prop.chisq = FALSE, prop.c >FALSE, >>> prop.t = FALSE, format = "SPSS") >>>> >>>> Moreover, I wanted to make it compatible with piping, so I decided >to >>> create the following function: >>>> >>>> ctab <- function(data) { >>>> CrossTable(data[,1], data[,2], prop.chisq = FALSE, prop.c >FALSE, >>> prop.t = FALSE, format = "SPSS") >>>> } >>>> >>>> When I try to use this function, however, I get the following >error: >>>> >>>> #this results in 'Error: Must use a vector in `[`, not an object of >>> class matrix.' >>>> data %>% select(col1, col2) %>% ctab() >>>> >>>> I tried searching online but couldn't find much about that error >>> (except for in specific and unrelated cases). Moreover, when I >created >>> a very simple dataset, it turns out there's no problem: >>>> >>>> #this runs fine >>>> data.frame(C1 = c('x','y','x','y'), C2 = c('a','a','b','b')) %>% >>> ctab() >>>> >>>> >>>> Is this a problem with my function or the data? If it's the data, >why >>> does directly calling CrossTable work? >>> >>> Presumably data %>% select(col1, col2) isn't giving you a >dataframe. >>> However, you haven't given us a reproducible example, so I can't >tell >>> you what it's doing. But that's where you should look. >>> >>> Duncan Murdoch >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >>> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>-- Sent from my phone. Please excuse my brevity.