I?m attempting to do some content analysis on a few million tweets, but I can?t seem to get them cleaned correctly. I?m trying to replicate the process outlined here:?https://stackoverflow.com/questions/46734501/opposite-of-unnest-tokens My code: tweets %>% ?unnest_tokens(word, text, token = 'tweets') %>% ?filter(!word %in% stop_words$word) %>% ?nest(word) %>% ?mutate(text = map(data, unlist), ? ? ? ? ???text = map_chr(text, paste, collapse = " ")) -> tweets Unfortunately, I keep getting: ?Error in mutate_impl(.data, dots) : ?Evaluation error: cannot coerce type 'closure' to vector of type 'character?. What am I doing wrong? Here?s what the dataset looks like:> glimpse(tweets)Observations: 389,253 Variables: 12 $ status_id "x1047841705729306624", "x1046966595610927105", "x104709... $ created_at "2018-10-04T13:31:45Z", "2018-10-02T03:34:22Z", "2018-10... $ text "Technique is everything with olympic lifts ! @ Body By ... $ lat 43.68359, 40.28412, 37.77066, 40.43139, 31.16889, 33.937... $ lng -70.32841, -83.07859, -122.43598, -79.98069, -100.07689,... $ county_name "Cumberland County", "Delaware County", "San Francisco C... $ fips 23005, 39041, 6075, 42003, 48095, 6037, 6037, 55073, 482... $ state_name "Maine", "Ohio", "California", "Pennsylvania", "Texas", ... $ state_abb "ME", "OH", "CA", "PA", "TX", "CA", "CA", "WI", "TX", "A... $ urban_level "Medium Metro", "Large Fringe Metro", "Large Central Met... $ urban_code 3, 2, 1, 1, 6, 1, 1, 4, 1, 3, 2, 2, 1, 3, 6, 1, 1, 2, 3,... $ population 277308, 184029, 830781, 1160433, 4160, 9509611, 9509611,... -- Nate Parsons Pronouns: He, Him, His Graduate Teaching Assistant Department of Sociology Portland State University Portland, Oregon 503-725-9025 503-725-3957 FAX [[alternative HTML version deleted]]
Hi Nate, You've made it pretty difficult to answer your question. Please see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example and follow some of the suggestions you find there to make it easier on those who want to help you. Best, Ista On Mon, Oct 15, 2018 at 10:56 PM Nathan Parsons <nathan.f.parsons at gmail.com> wrote:> > I?m attempting to do some content analysis on a few million tweets, but I can?t seem to get them cleaned correctly. > > I?m trying to replicate the process outlined here: https://stackoverflow.com/questions/46734501/opposite-of-unnest-tokens > > My code: > > tweets %>% > unnest_tokens(word, text, token = 'tweets') %>% > filter(!word %in% stop_words$word) %>% > nest(word) %>% > mutate(text = map(data, unlist), > text = map_chr(text, paste, collapse = " ")) -> tweets > > Unfortunately, I keep getting: > > Error in mutate_impl(.data, dots) : > Evaluation error: cannot coerce type 'closure' to vector of type 'character?. > > What am I doing wrong? > > Here?s what the dataset looks like: > > > glimpse(tweets) > Observations: 389,253 > Variables: 12 > $ status_id "x1047841705729306624", "x1046966595610927105", "x104709... > $ created_at "2018-10-04T13:31:45Z", "2018-10-02T03:34:22Z", "2018-10... > $ text "Technique is everything with olympic lifts ! @ Body By ... > $ lat 43.68359, 40.28412, 37.77066, 40.43139, 31.16889, 33.937... > $ lng -70.32841, -83.07859, -122.43598, -79.98069, -100.07689,... > $ county_name "Cumberland County", "Delaware County", "San Francisco C... > $ fips 23005, 39041, 6075, 42003, 48095, 6037, 6037, 55073, 482... > $ state_name "Maine", "Ohio", "California", "Pennsylvania", "Texas", ... > $ state_abb "ME", "OH", "CA", "PA", "TX", "CA", "CA", "WI", "TX", "A... > $ urban_level "Medium Metro", "Large Fringe Metro", "Large Central Met... > $ urban_code 3, 2, 1, 1, 6, 1, 1, 4, 1, 3, 2, 2, 1, 3, 6, 1, 1, 2, 3,... > $ population 277308, 184029, 830781, 1160433, 4160, 9509611, 9509611,... > > -- > > Nate Parsons > Pronouns: He, Him, His > Graduate Teaching Assistant > Department of Sociology > Portland State University > Portland, Oregon > > 503-725-9025 > 503-725-3957 FAX > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Ista - I provided data, code, and the error being returned as per reproducible r protocol. I did not include packages, however. unnest_tokens is from the TidyText package, map/map_chr are from purrr, and everything else is from tidyverse(dplyr/tidyr/etc.) Not sure what else I can provide to make this more clear. -- Nate Parsons Pronouns: He, Him, His Graduate Teaching Assistant Department of Sociology Portland State University Portland, Oregon 503-725-9025 503-725-3957 FAX On Oct 16, 2018, 12:35 PM -0700, Ista Zahn <istazahn at gmail.com>, wrote:> Hi Nate, > > You've made it pretty difficult to answer your question. Please see > https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example > and follow some of the suggestions you find there to make it easier on > those who want to help you. > > Best, > Ista > On Mon, Oct 15, 2018 at 10:56 PM Nathan Parsons > <nathan.f.parsons at gmail.com> wrote: > > > > I?m attempting to do some content analysis on a few million tweets, but I can?t seem to get them cleaned correctly. > > > > I?m trying to replicate the process outlined here: https://stackoverflow.com/questions/46734501/opposite-of-unnest-tokens > > > > My code: > > > > tweets %>% > > unnest_tokens(word, text, token = 'tweets') %>% > > filter(!word %in% stop_words$word) %>% > > nest(word) %>% > > mutate(text = map(data, unlist), > > text = map_chr(text, paste, collapse = " ")) -> tweets > > > > Unfortunately, I keep getting: > > > > Error in mutate_impl(.data, dots) : > > Evaluation error: cannot coerce type 'closure' to vector of type 'character?. > > > > What am I doing wrong? > > > > Here?s what the dataset looks like: > > > > > glimpse(tweets) > > Observations: 389,253 > > Variables: 12 > > $ status_id "x1047841705729306624", "x1046966595610927105", "x104709... > > $ created_at "2018-10-04T13:31:45Z", "2018-10-02T03:34:22Z", "2018-10... > > $ text "Technique is everything with olympic lifts ! @ Body By ... > > $ lat 43.68359, 40.28412, 37.77066, 40.43139, 31.16889, 33.937... > > $ lng -70.32841, -83.07859, -122.43598, -79.98069, -100.07689,... > > $ county_name "Cumberland County", "Delaware County", "San Francisco C... > > $ fips 23005, 39041, 6075, 42003, 48095, 6037, 6037, 55073, 482... > > $ state_name "Maine", "Ohio", "California", "Pennsylvania", "Texas", ... > > $ state_abb "ME", "OH", "CA", "PA", "TX", "CA", "CA", "WI", "TX", "A... > > $ urban_level "Medium Metro", "Large Fringe Metro", "Large Central Met... > > $ urban_code 3, 2, 1, 1, 6, 1, 1, 4, 1, 3, 2, 2, 1, 3, 6, 1, 1, 2, 3,... > > $ population 277308, 184029, 830781, 1160433, 4160, 9509611, 9509611,... > > > > -- > > > > Nate Parsons > > Pronouns: He, Him, His > > Graduate Teaching Assistant > > Department of Sociology > > Portland State University > > Portland, Oregon > > > > 503-725-9025 > > 503-725-3957 FAX > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.[[alternative HTML version deleted]]