Hello, I am trying to read a set of json files containing tweets using the following code: json_data <- fromJSON(paste(readLines(json_file)) Unfortunately, it only reads the first record on the file. For example, in the file below, it only reads the first record starting with "id":"tag: search.twitter.com,2005:3318539389". What is the best way to retrieve these records? I have 20 such json files with varying number of tweets in it. Thank you in advance. Best, Mayukh {"id":"tag:search.twitter.com ,2005:3318539389","objectType":"activity","actor":{"objectType":"person","id":"id: twitter.com:2859421","link":"http://www.twitter.com/meetjenn","displayName":"Jenn","postedTime":"2007-01-29T17:06:00.000Z","image":"06-19-07_2010.jpg","summary":"I say 'like' a lot. I fall down a lot. I walk into everything. Love Pgh Pens, NE Pats, Fundraising, Dogs & History. Craft Beer & Running Novice.","links":[{"href":"http://meetjenn.tumblr.com","rel":"me"}],"friendsCount":0,"followersCount":0,"listedCount":0,"statusesCount":0,"twitterTimeZone":"Eastern Time (US & Canada)","verified":false,"utcOffset":"0","preferredUsername":"meetjenn","languages":["en"],"location":{"objectType":"place","displayName":"Pgh/Philajersey"},"favoritesCount":0},"verb":"post","postedTime":"2009-08-15T00:00:12.000Z","generator":{"displayName":"tweetdeck","link":" http://twitter.com "},"provider":{"objectType":"service","displayName":"Twitter","link":" http://www.twitter.com"},"link":" http://twitter.com/meetjenn/statuses/3318539389","body":"Cool story about the man who created the @Starbucks logo. Additional link at the bottom on how it came to be: http://bit.ly/16bOJk ","object":{"objectType":"note","id":"object:search.twitter.com,2005:3318539389","summary":"Cool story about the man who created the @Starbucks logo. Additional link at the bottom on how it came to be: http://bit.ly/16bOJk","link":" http://twitter.com/meetjenn/statuses/3318539389 ","postedTime":"2009-08-15T00:00:12.000Z"},"twitter_entities":{"urls":[{"expanded_url":null,"indices":[111,131],"url":" http://bit.ly/16bOJk "}],"hashtags":[],"user_mentions":[{"id":null,"name":null,"indices":[41,51],"screen_name":"@Starbucks","id_str":null}]},"retweetCount":0,"gnip":{"matching_rules":[{"value":"Starbucks","tag":null}]}} {"id":"tag:search.twitter.com ,2005:3318543260","objectType":"activity","actor":{"objectType":"person","id":"id: twitter.com:61595468","link":"http://www.twitter.com/FastestFood","displayName":"FastFood Bob","postedTime":"2009-01-30T20:51:10.000Z","image":"","summary":"Just A little food for thought","links":[{"href":"http://www.TeamSantilli.com","rel":"me"}],"friendsCount":0,"followersCount":0,"listedCount":0,"statusesCount":0,"twitterTimeZone":"Pacific Time (US & Canada)","verified":false,"utcOffset":"0","preferredUsername":"FastestFood","languages":["en"],"location":{"objectType":"place","displayName":"eating some thoughts"},"favoritesCount":0},"verb":"post","postedTime":"2009-08-15T00:00:23.000Z","generator":{"displayName":"oauth:17","link":" http://twitter.com "},"provider":{"objectType":"service","displayName":"Twitter","link":" http://www.twitter.com"},"link":" http://twitter.com/FastestFood/statuses/3318543260","body":"Oregon Biz Report ? How Starbucks saved millions. Oregon closures ... http://u.mavrev.com/02bdj","object":{"objectType":"note","id":"object: search.twitter.com,2005:3318543260","summary":"Oregon Biz Report ? How Starbucks saved millions. Oregon closures ... http://u.mavrev.com/02bdj ","link":"http://twitter.com/FastestFood/statuses/3318543260 ","postedTime":"2009-08-15T00:00:23.000Z"},"twitter_entities":{"urls":[{"expanded_url":null,"indices":[70,95],"url":" http://u.mavrev.com/02bdj "}],"hashtags":[],"user_mentions":[]},"retweetCount":0,"gnip":{"matching_rules":[{"value":"Starbucks","tag":null}]}} {"info":{"message":"Replay Request Completed","sent":"2015-02-18T00:05:15+00:00","activity_count":2}} [[alternative HTML version deleted]]
Mayukh, I think you are missing an argument to paste() and a right parenthesis character. Try json_data <- fromJSON(paste(readLines(json_file), collapse = " ")) Mark R. Mark Sharp, Ph.D. msharp at TxBiomed.org> On Jul 27, 2015, at 3:41 PM, Mayukh Dass <mayukh.dass at gmail.com> wrote: > > Hello, > > I am trying to read a set of json files containing tweets using the > following code: > > json_data <- fromJSON(paste(readLines(json_file)) > > Unfortunately, it only reads the first record on the file. For example, in > the file below, it only reads the first record starting with "id":"tag: > search.twitter.com,2005:3318539389". What is the best way to retrieve these > records? I have 20 such json files with varying number of tweets in it. > Thank you in advance. > > Best, > Mayukh > > {"id":"tag:search.twitter.com > ,2005:3318539389","objectType":"activity","actor":{"objectType":"person","id":"id: > twitter.com:2859421","link":"http://www.twitter.com/meetjenn","displayName":"Jenn","postedTime":"2007-01-29T17:06:00.000Z","image":"06-19-07_2010.jpg","summary":"I > say 'like' a lot. I fall down a lot. I walk into everything. Love Pgh Pens, > NE Pats, Fundraising, Dogs & History. Craft Beer & Running > Novice.","links":[{"href":"http://meetjenn.tumblr.com","rel":"me"}],"friendsCount":0,"followersCount":0,"listedCount":0,"statusesCount":0,"twitterTimeZone":"Eastern > Time (US & > Canada)","verified":false,"utcOffset":"0","preferredUsername":"meetjenn","languages":["en"],"location":{"objectType":"place","displayName":"Pgh/Philajersey"},"favoritesCount":0},"verb":"post","postedTime":"2009-08-15T00:00:12.000Z","generator":{"displayName":"tweetdeck","link":" > http://twitter.com > "},"provider":{"objectType":"service","displayName":"Twitter","link":" > http://www.twitter.com"},"link":" > http://twitter.com/meetjenn/statuses/3318539389","body":"Cool story about > the man who created the @Starbucks logo. Additional link at the bottom on > how it came to be: http://bit.ly/16bOJk > ","object":{"objectType":"note","id":"object:search.twitter.com,2005:3318539389","summary":"Cool > story about the man who created the @Starbucks logo. Additional link at the > bottom on how it came to be: http://bit.ly/16bOJk","link":" > http://twitter.com/meetjenn/statuses/3318539389 > ","postedTime":"2009-08-15T00:00:12.000Z"},"twitter_entities":{"urls":[{"expanded_url":null,"indices":[111,131],"url":" > http://bit.ly/16bOJk > "}],"hashtags":[],"user_mentions":[{"id":null,"name":null,"indices":[41,51],"screen_name":"@Starbucks","id_str":null}]},"retweetCount":0,"gnip":{"matching_rules":[{"value":"Starbucks","tag":null}]}} > {"id":"tag:search.twitter.com > ,2005:3318543260","objectType":"activity","actor":{"objectType":"person","id":"id: > twitter.com:61595468","link":"http://www.twitter.com/FastestFood","displayName":"FastFood > Bob","postedTime":"2009-01-30T20:51:10.000Z","image":"","summary":"Just A > little food for > thought","links":[{"href":"http://www.TeamSantilli.com","rel":"me"}],"friendsCount":0,"followersCount":0,"listedCount":0,"statusesCount":0,"twitterTimeZone":"Pacific > Time (US & > Canada)","verified":false,"utcOffset":"0","preferredUsername":"FastestFood","languages":["en"],"location":{"objectType":"place","displayName":"eating > some > thoughts"},"favoritesCount":0},"verb":"post","postedTime":"2009-08-15T00:00:23.000Z","generator":{"displayName":"oauth:17","link":" > http://twitter.com > "},"provider":{"objectType":"service","displayName":"Twitter","link":" > http://www.twitter.com"},"link":" > http://twitter.com/FastestFood/statuses/3318543260","body":"Oregon Biz > Report ? How Starbucks saved millions. Oregon closures ... > http://u.mavrev.com/02bdj","object":{"objectType":"note","id":"object: > search.twitter.com,2005:3318543260","summary":"Oregon Biz Report ? How > Starbucks saved millions. Oregon closures ... http://u.mavrev.com/02bdj > ","link":"http://twitter.com/FastestFood/statuses/3318543260 > ","postedTime":"2009-08-15T00:00:23.000Z"},"twitter_entities":{"urls":[{"expanded_url":null,"indices":[70,95],"url":" > http://u.mavrev.com/02bdj > "}],"hashtags":[],"user_mentions":[]},"retweetCount":0,"gnip":{"matching_rules":[{"value":"Starbucks","tag":null}]}} > {"info":{"message":"Replay Request > Completed","sent":"2015-02-18T00:05:15+00:00","activity_count":2}} > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Thanks Mark. I made a mistake when I was coping the code on the email. I have the parentheses in my code. Best, Mayukh> On Jul 27, 2015, at 5:16 PM, Mark Sharp <msharp at TxBiomed.org> wrote: > > Mayukh, > > I think you are missing an argument to paste() and a right parenthesis character. > > Try > json_data <- fromJSON(paste(readLines(json_file), collapse = " ")) > > Mark > R. Mark Sharp, Ph.D. > msharp at TxBiomed.org > > > > > >> On Jul 27, 2015, at 3:41 PM, Mayukh Dass <mayukh.dass at gmail.com> wrote: >> >> Hello, >> >> I am trying to read a set of json files containing tweets using the >> following code: >> >> json_data <- fromJSON(paste(readLines(json_file)) >> >> Unfortunately, it only reads the first record on the file. For example, in >> the file below, it only reads the first record starting with "id":"tag: >> search.twitter.com,2005:3318539389". What is the best way to retrieve these >> records? I have 20 such json files with varying number of tweets in it. >> Thank you in advance. >> >> Best, >> Mayukh >> >> {"id":"tag:search.twitter.com >> ,2005:3318539389","objectType":"activity","actor":{"objectType":"person","id":"id: >> twitter.com:2859421","link":"http://www.twitter.com/meetjenn","displayName":"Jenn","postedTime":"2007-01-29T17:06:00.000Z","image":"06-19-07_2010.jpg","summary":"I >> say 'like' a lot. I fall down a lot. I walk into everything. Love Pgh Pens, >> NE Pats, Fundraising, Dogs & History. Craft Beer & Running >> Novice.","links":[{"href":"http://meetjenn.tumblr.com","rel":"me"}],"friendsCount":0,"followersCount":0,"listedCount":0,"statusesCount":0,"twitterTimeZone":"Eastern >> Time (US & >> Canada)","verified":false,"utcOffset":"0","preferredUsername":"meetjenn","languages":["en"],"location":{"objectType":"place","displayName":"Pgh/Philajersey"},"favoritesCount":0},"verb":"post","postedTime":"2009-08-15T00:00:12.000Z","generator":{"displayName":"tweetdeck","link":" >> http://twitter.com >> "},"provider":{"objectType":"service","displayName":"Twitter","link":" >> http://www.twitter.com"},"link":" >> http://twitter.com/meetjenn/statuses/3318539389","body":"Cool story about >> the man who created the @Starbucks logo. Additional link at the bottom on >> how it came to be: http://bit.ly/16bOJk >> ","object":{"objectType":"note","id":"object:search.twitter.com,2005:3318539389","summary":"Cool >> story about the man who created the @Starbucks logo. Additional link at the >> bottom on how it came to be: http://bit.ly/16bOJk","link":" >> http://twitter.com/meetjenn/statuses/3318539389 >> ","postedTime":"2009-08-15T00:00:12.000Z"},"twitter_entities":{"urls":[{"expanded_url":null,"indices":[111,131],"url":" >> http://bit.ly/16bOJk >> "}],"hashtags":[],"user_mentions":[{"id":null,"name":null,"indices":[41,51],"screen_name":"@Starbucks","id_str":null}]},"retweetCount":0,"gnip":{"matching_rules":[{"value":"Starbucks","tag":null}]}} >> {"id":"tag:search.twitter.com >> ,2005:3318543260","objectType":"activity","actor":{"objectType":"person","id":"id: >> twitter.com:61595468","link":"http://www.twitter.com/FastestFood","displayName":"FastFood >> Bob","postedTime":"2009-01-30T20:51:10.000Z","image":"","summary":"Just A >> little food for >> thought","links":[{"href":"http://www.TeamSantilli.com","rel":"me"}],"friendsCount":0,"followersCount":0,"listedCount":0,"statusesCount":0,"twitterTimeZone":"Pacific >> Time (US & >> Canada)","verified":false,"utcOffset":"0","preferredUsername":"FastestFood","languages":["en"],"location":{"objectType":"place","displayName":"eating >> some >> thoughts"},"favoritesCount":0},"verb":"post","postedTime":"2009-08-15T00:00:23.000Z","generator":{"displayName":"oauth:17","link":" >> http://twitter.com >> "},"provider":{"objectType":"service","displayName":"Twitter","link":" >> http://www.twitter.com"},"link":" >> http://twitter.com/FastestFood/statuses/3318543260","body":"Oregon Biz >> Report ? How Starbucks saved millions. Oregon closures ... >> http://u.mavrev.com/02bdj","object":{"objectType":"note","id":"object: >> search.twitter.com,2005:3318543260","summary":"Oregon Biz Report ? How >> Starbucks saved millions. Oregon closures ... http://u.mavrev.com/02bdj >> ","link":"http://twitter.com/FastestFood/statuses/3318543260 >> ","postedTime":"2009-08-15T00:00:23.000Z"},"twitter_entities":{"urls":[{"expanded_url":null,"indices":[70,95],"url":" >> http://u.mavrev.com/02bdj >> "}],"hashtags":[],"user_mentions":[]},"retweetCount":0,"gnip":{"matching_rules":[{"value":"Starbucks","tag":null}]}} >> {"info":{"message":"Replay Request >> Completed","sent":"2015-02-18T00:05:15+00:00","activity_count":2}} >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >
Mayukh, I apologize for taking so long to get back to your problem. I expect you may have found the solution. If so I would be interested. I have developed a hack to solve the problem, but I expect if someone knew how to handle JSON objects or even text parsing better they could develop a more elegant solution. As I understand the problem, your text file has more than one JSON object in text form. There are three. The first two are very similar and the last is a trailer indication what was done, when it was done and the number of JSON objects sent. The problem is that fromJSON() only pulls off the first of the JSON objects. I have defined three helper functions to separate the JSON objects, read them in, and store them in a list. library(RJSONIO) library(stringi, quietly = TRUE) #library(jsonlite) # also works #' Returns dataframe with ordered locations of the matching braces. #' #' There is almost certainly a better function to do this. #' @param txt character vector of length one having 0 or more matching braces. #' @import stringi #' @examples #' library(rmsutilityr) #' match_braces("{123{456{78}9}10}") #' @export match_braces <- function(txt) { txt <- txt[1] # just in the case of having more than one element left <- stri_locate_all_regex(txt, "\\{")[[1]][ , 1] right <- stri_locate_all_regex(txt, "\\}")[[1]][ , 2] len <- length(left) braces <- data.frame(left = rep(0, len), right = rep(0, len)) for (i in seq_along(right)) { for (j in rev(seq_along(left))) { if (left[j] < right[i] & left[j] != 0) { braces$left[i] <- left[j] braces$right[i] <- right[i] left[j] <- 0 break } } } braces[order(braces$left), ] } #' Returns a list containing two objects in the text of a character vector #' of length one: (1) object = the first json object found and (2) remainder = #' the remaining text. #' #' Properly formed messages are assumed. Error checking is non-existent. #' @param json_txt character vector of length one having one or more JSON #' objects in character form. #' @import stringi #' @export get_first_json_message <- function(json_txt) { len <- stri_length(json_txt) braces <- match_braces(json_txt) if (braces$right[1] + 1 > len) { remainder <- "" } else { remainder <- stri_trim_both(stri_sub(json_txt, braces$right[1] + 1)) } list(object = stri_sub(json_txt, braces$left[1], to = braces$right[1]), remainder = remainder) } #' Returns list of lists made by call to fromJSON() #' @param json_txt character vector of length 1 having one or more #' JSON objects in text form. #' @import stringi #' @export get_json_list <- function (json_txt) { t_json_txt <- json_txt i <- 0 json_list <- list() repeat{ i <- i + 1 message_remainder <- get_first_json_message(t_json_txt) json_list[i] <- list(fromJSON(message_remainder$object)) if (message_remainder$remainder == "") break t_json_txt <- message_remainder$remainder } json_list } json_file <- "../data/json_file.txt" json_txt <- stri_trim_both(stri_c(readLines(json_file), collapse = " ")) json_list <- get_json_list(json_txt) length(json_list) R. Mark Sharp, Ph.D. Director of Primate Records Database Southwest National Primate Research Center Texas Biomedical Research Institute P.O. Box 760549 San Antonio, TX 78245-0549 Telephone: (210)258-9476 e-mail: msharp at TxBiomed.org> On Jul 27, 2015, at 5:16 PM, Mark Sharp <msharp at TxBiomed.org> wrote: > > Mayukh, > > I think you are missing an argument to paste() and a right parenthesis character. > > Try > json_data <- fromJSON(paste(readLines(json_file), collapse = " ")) > > Mark > R. Mark Sharp, Ph.D. > msharp at TxBiomed.org > > > > > >> On Jul 27, 2015, at 3:41 PM, Mayukh Dass <mayukh.dass at gmail.com> wrote: >> >> Hello, >> >> I am trying to read a set of json files containing tweets using the >> following code: >> >> json_data <- fromJSON(paste(readLines(json_file)) >> >> Unfortunately, it only reads the first record on the file. For example, in >> the file below, it only reads the first record starting with "id":"tag: >> search.twitter.com,2005:3318539389". What is the best way to retrieve these >> records? I have 20 such json files with varying number of tweets in it. >> Thank you in advance. >> >> Best, >> Mayukh >> >> {"id":"tag:search.twitter.com >> ,2005:3318539389","objectType":"activity","actor":{"objectType":"person","id":"id: >> twitter.com:2859421","link":"http://www.twitter.com/meetjenn","displayName":"Jenn","postedTime":"2007-01-29T17:06:00.000Z","image":"06-19-07_2010.jpg","summary":"I >> say 'like' a lot. I fall down a lot. I walk into everything. Love Pgh Pens, >> NE Pats, Fundraising, Dogs & History. Craft Beer & Running >> Novice.","links":[{"href":"http://meetjenn.tumblr.com","rel":"me"}],"friendsCount":0,"followersCount":0,"listedCount":0,"statusesCount":0,"twitterTimeZone":"Eastern >> Time (US & >> Canada)","verified":false,"utcOffset":"0","preferredUsername":"meetjenn","languages":["en"],"location":{"objectType":"place","displayName":"Pgh/Philajersey"},"favoritesCount":0},"verb":"post","postedTime":"2009-08-15T00:00:12.000Z","generator":{"displayName":"tweetdeck","link":" >> http://twitter.com >> "},"provider":{"objectType":"service","displayName":"Twitter","link":" >> http://www.twitter.com"},"link":" >> http://twitter.com/meetjenn/statuses/3318539389","body":"Cool story about >> the man who created the @Starbucks logo. Additional link at the bottom on >> how it came to be: http://bit.ly/16bOJk >> ","object":{"objectType":"note","id":"object:search.twitter.com,2005:3318539389","summary":"Cool >> story about the man who created the @Starbucks logo. Additional link at the >> bottom on how it came to be: http://bit.ly/16bOJk","link":" >> http://twitter.com/meetjenn/statuses/3318539389 >> ","postedTime":"2009-08-15T00:00:12.000Z"},"twitter_entities":{"urls":[{"expanded_url":null,"indices":[111,131],"url":" >> http://bit.ly/16bOJk >> "}],"hashtags":[],"user_mentions":[{"id":null,"name":null,"indices":[41,51],"screen_name":"@Starbucks","id_str":null}]},"retweetCount":0,"gnip":{"matching_rules":[{"value":"Starbucks","tag":null}]}} >> {"id":"tag:search.twitter.com >> ,2005:3318543260","objectType":"activity","actor":{"objectType":"person","id":"id: >> twitter.com:61595468","link":"http://www.twitter.com/FastestFood","displayName":"FastFood >> Bob","postedTime":"2009-01-30T20:51:10.000Z","image":"","summary":"Just A >> little food for >> thought","links":[{"href":"http://www.TeamSantilli.com","rel":"me"}],"friendsCount":0,"followersCount":0,"listedCount":0,"statusesCount":0,"twitterTimeZone":"Pacific >> Time (US & >> Canada)","verified":false,"utcOffset":"0","preferredUsername":"FastestFood","languages":["en"],"location":{"objectType":"place","displayName":"eating >> some >> thoughts"},"favoritesCount":0},"verb":"post","postedTime":"2009-08-15T00:00:23.000Z","generator":{"displayName":"oauth:17","link":" >> http://twitter.com >> "},"provider":{"objectType":"service","displayName":"Twitter","link":" >> http://www.twitter.com"},"link":" >> http://twitter.com/FastestFood/statuses/3318543260","body":"Oregon Biz >> Report ? How Starbucks saved millions. Oregon closures ... >> http://u.mavrev.com/02bdj","object":{"objectType":"note","id":"object: >> search.twitter.com,2005:3318543260","summary":"Oregon Biz Report ? How >> Starbucks saved millions. Oregon closures ... http://u.mavrev.com/02bdj >> ","link":"http://twitter.com/FastestFood/statuses/3318543260 >> ","postedTime":"2009-08-15T00:00:23.000Z"},"twitter_entities":{"urls":[{"expanded_url":null,"indices":[70,95],"url":" >> http://u.mavrev.com/02bdj >> "}],"hashtags":[],"user_mentions":[]},"retweetCount":0,"gnip":{"matching_rules":[{"value":"Starbucks","tag":null}]}} >> {"info":{"message":"Replay Request >> Completed","sent":"2015-02-18T00:05:15+00:00","activity_count":2}} >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >