Hello! Question. I'm dealing with a large excel sheet that I'm trying to tidy and then visualize, and I'm wondering how I might specify the data I'm visualizing. Here's the data frame I'm working with:> str(unclean_data)Classes ?tbl_df?, ?tbl? and 'data.frame': 1909 obs. of 9 variables: $ unique identifier: num 1 1 1 1 1 1 1 1 1 1 ... $ question : num 1 2 2 2 2 2 2 3 3 3 ... $ grid text : chr "******* and his family have lived and worked in ******* for 6 years." "******* contributes to public safety while also organizing community events. He said he hosts Trunk or Treat, en"| __truncated__ "******* did not know the origin or history of ******* PD, but he said it is integral to the safety of the area." "The ******* PD ensures safety, he said, while also familiarizing themselves with the town?s people. He said ev"| __truncated__ ...>The most important column is the $grid text one, and I know how to extract that:> text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`)But my question is, what if I only wanted to extract stuff from the $grid text column that was itself only correlated with the number 3 in the $question column? So, instead of visualizing or rather tidying the whole $grid text column, I want to only tidy a smaller portion of it, only that which is indexed to the number 3 is the $question column. Is there a way to do that in this line of code:> text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`)Or do I have to FIRST shorten the $`grid text` column (shorten it to only that which is indexed to 3 in the $question column) BEFORE I even begin to tidy it? I'm working with these libraries right now, if it helps: library(tidytext) library(dplyr) library(stringr) D
Hi Drake, This is a guess on my part, but what about: \ q3only<-unclean_data[unclean_data$question == 3,] then perform your operations on q3only Jim On Thu, Jul 2, 2020 at 8:35 PM Drake Gossi <drake.gossi at gmail.com> wrote:> > Hello! > > Question. I'm dealing with a large excel sheet that I'm trying to tidy > and then visualize, and I'm wondering how I might specify the data I'm > visualizing. > > Here's the data frame I'm working with: > > > str(unclean_data) > Classes ?tbl_df?, ?tbl? and 'data.frame': 1909 obs. of 9 variables: > $ unique identifier: num 1 1 1 1 1 1 1 1 1 1 ... > $ question : num 1 2 2 2 2 2 2 3 3 3 ... > $ grid text : chr "******* and his family have lived and > worked in ******* for 6 years." "******* contributes to public safety > while also organizing community events. He said he hosts Trunk or > Treat, en"| __truncated__ "******* did not know the origin or history > of ******* PD, but he said it is integral to the safety of the area." > "The ******* PD ensures safety, he said, while also familiarizing > themselves with the town?s people. He said ev"| __truncated__ ... > > > > The most important column is the $grid text one, and I know how to extract that: > > > text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`) > > But my question is, what if I only wanted to extract stuff from the > $grid text column that was itself only correlated with the number 3 in > the $question column? So, instead of visualizing or rather tidying the > whole $grid text column, I want to only tidy a smaller portion of it, > only that which is indexed to the number 3 is the $question column. > > Is there a way to do that in this line of code: > > > text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`) > > Or do I have to FIRST shorten the $`grid text` column (shorten it to > only that which is indexed to 3 in the $question column) BEFORE I even > begin to tidy it? > > I'm working with these libraries right now, if it helps: > > library(tidytext) > library(dplyr) > library(stringr) > > D > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hello, Maybe the following is what you are looking for. unclean_data %>% filter(question == 3) %>% mutate(line = row_number()) %>% select(line, `grid text`) Hope this helps, Rui Barradas ?s 23:47 de 01/07/2020, Drake Gossi escreveu:> Hello! > > Question. I'm dealing with a large excel sheet that I'm trying to tidy > and then visualize, and I'm wondering how I might specify the data I'm > visualizing. > > Here's the data frame I'm working with: > >> str(unclean_data) > Classes ?tbl_df?, ?tbl? and 'data.frame': 1909 obs. of 9 variables: > $ unique identifier: num 1 1 1 1 1 1 1 1 1 1 ... > $ question : num 1 2 2 2 2 2 2 3 3 3 ... > $ grid text : chr "******* and his family have lived and > worked in ******* for 6 years." "******* contributes to public safety > while also organizing community events. He said he hosts Trunk or > Treat, en"| __truncated__ "******* did not know the origin or history > of ******* PD, but he said it is integral to the safety of the area." > "The ******* PD ensures safety, he said, while also familiarizing > themselves with the town?s people. He said ev"| __truncated__ ... > The most important column is the $grid text one, and I know how to extract that: > >> text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`) > But my question is, what if I only wanted to extract stuff from the > $grid text column that was itself only correlated with the number 3 in > the $question column? So, instead of visualizing or rather tidying the > whole $grid text column, I want to only tidy a smaller portion of it, > only that which is indexed to the number 3 is the $question column. > > Is there a way to do that in this line of code: > >> text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`) > Or do I have to FIRST shorten the $`grid text` column (shorten it to > only that which is indexed to 3 in the $question column) BEFORE I even > begin to tidy it? > > I'm working with these libraries right now, if it helps: > > library(tidytext) > library(dplyr) > library(stringr) > > D > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.-- Este e-mail foi verificado em termos de v?rus pelo software antiv?rus Avast. https://www.avast.com/antivirus [[alternative HTML version deleted]]
Thank you very much, Jim and Rui. The line that ended up working for me was this:> ed_exp3 <- unclean_data[which(unclean_data$question == 3) %in% c("`grid text`")]However, as I read and study Jim's and Rui's code, I see how those would work too. Thank you all again! On Thu, Jul 2, 2020 at 5:07 AM Rui Barradas <ruipbarradas at sapo.pt> wrote:> > Hello, > > > Maybe the following is what you are looking for. > > > unclean_data %>% > filter(question == 3) %>% > mutate(line = row_number()) %>% > select(line, `grid text`) > > > Hope this helps, > > Rui Barradas > > > ?s 23:47 de 01/07/2020, Drake Gossi escreveu: > > Hello! > > Question. I'm dealing with a large excel sheet that I'm trying to tidy > and then visualize, and I'm wondering how I might specify the data I'm > visualizing. > > Here's the data frame I'm working with: > > str(unclean_data) > > Classes ?tbl_df?, ?tbl? and 'data.frame': 1909 obs. of 9 variables: > $ unique identifier: num 1 1 1 1 1 1 1 1 1 1 ... > $ question : num 1 2 2 2 2 2 2 3 3 3 ... > $ grid text : chr "******* and his family have lived and > worked in ******* for 6 years." "******* contributes to public safety > while also organizing community events. He said he hosts Trunk or > Treat, en"| __truncated__ "******* did not know the origin or history > of ******* PD, but he said it is integral to the safety of the area." > "The ******* PD ensures safety, he said, while also familiarizing > themselves with the town?s people. He said ev"| __truncated__ ... > > The most important column is the $grid text one, and I know how to extract that: > > text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`) > > But my question is, what if I only wanted to extract stuff from the > $grid text column that was itself only correlated with the number 3 in > the $question column? So, instead of visualizing or rather tidying the > whole $grid text column, I want to only tidy a smaller portion of it, > only that which is indexed to the number 3 is the $question column. > > Is there a way to do that in this line of code: > > text_df_APPLIED <- tibble(line = 1:1909, text = unclean_data$`grid text`) > > Or do I have to FIRST shorten the $`grid text` column (shorten it to > only that which is indexed to 3 in the $question column) BEFORE I even > begin to tidy it? > > I'm working with these libraries right now, if it helps: > > library(tidytext) > library(dplyr) > library(stringr) > > D > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > > Sem v?rus. www.avast.com-- Drake Gossi Phd Student University of Texas at Austin