Dear all, I am new to R in general and ways to retrieve XML or JSON data in particular. I have tried to get information through the XML package and various websites without being able to do exactly what I want. I hope someone of you can give me some help. I want to retrieve information about movies from IMDB or rather the unofficial api, www.imdbapi.com. I have a vector with a lot movie-ids according to IMDB standard. To give just a few: ids <-c("tt0110074", "tt0096184", "tt0081568", "tt0448134", "tt0079367") Now, I want to create a data frame where each of the movies refer to one line and the other information is retrieved from the api. This can be retrieved either as XML data or JSON data, e.g. JSON: http://www.imdbapi.com/?i=tt0110074&tomatoes=TRUE XML: http://www.imdbapi.com/?i=tt0110074&r=XML&tomatoes=TRUE Where i refer to the movie-id, i.e. the information I have in my vector. They are all in the format ttXXXXXXX I have tried to use the XML package, but I have not been able to get the data into a workable data frame. I take for granted that this is due to my limited knowledge about R and I aim to learn more, but right now I am in a bit of a hurry since this unofficial api will be taken of the web in a few days, and I really want to crete this data frame for further analysis. Any help would be greatly appreciated. All the best, Richard O
You can try this one... ##### library(RCurl) library(rjson) ids <- c("tt0110074", "tt0096184", "tt0081568", "tt0448134", "tt0079367") titles <- data.frame() for ( i in 1:length(ids)) { req <- paste("http://www.imdbapi.com/?i=", ids[i] , "&tomatoes=TRUE", sep="") u <- getURL(req) j <- fromJSON(u) titles <- rbind(titles, as.data.frame(j)) } ##### I am sure it can be done more efficient... 2012/7/26 Richard Ohrvall <richard.ohrvall@gmail.com>> Dear all, > > I am new to R in general and ways to retrieve XML or JSON data in > particular. I have tried to get information through the XML package > and various websites without being able to do exactly what I want. I > hope someone of you can give me some help. > > I want to retrieve information about movies from IMDB or rather the > unofficial api, www.imdbapi.com. I have a vector with a lot movie-ids > according to IMDB standard. To give just a few: > > ids <-c("tt0110074", "tt0096184", "tt0081568", "tt0448134", "tt0079367") > > Now, I want to create a data frame where each of the movies refer to > one line and the other information is retrieved from the api. This can > be retrieved either as XML data or JSON data, e.g. > > JSON: > http://www.imdbapi.com/?i=tt0110074&tomatoes=TRUE > > XML: > http://www.imdbapi.com/?i=tt0110074&r=XML&tomatoes=TRUE > > Where i refer to the movie-id, i.e. the information I have in my > vector. They are all in the format ttXXXXXXX > > I have tried to use the XML package, but I have not been able to get > the data into a workable data frame. I take for granted that this is > due to my limited knowledge about R and I aim to learn more, but right > now I am in a bit of a hurry since this unofficial api will be taken > of the web in a few days, and I really want to crete this data frame > for further analysis. > > Any help would be greatly appreciated. > > All the best, > Richard O > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
On Thu, Jul 26, 2012 at 4:18 AM, Richard Ohrvall <richard.ohrvall at gmail.com> wrote:> Dear all, > > I am new to R in general and ways to retrieve XML or JSON data in > particular. I have tried to get information through the XML package > and various websites without being able to do exactly what I want. I > hope someone of you can give me some help. > > I want to retrieve information about movies from IMDB or rather the > unofficial api, www.imdbapi.com. I have a vector with a lot movie-ids > according to IMDB standard. To give just a few: > > ids <-c("tt0110074", "tt0096184", "tt0081568", "tt0448134", "tt0079367") > > Now, I want to create a data frame where each of the movies refer to > one line and the other information is retrieved from the api. This can > be retrieved either as XML data or JSON data, e.g. > > JSON: > http://www.imdbapi.com/?i=tt0110074&tomatoes=TRUE > > XML: > http://www.imdbapi.com/?i=tt0110074&r=XML&tomatoes=TRUE > > Where i refer to the movie-id, i.e. the information I have in my > vector. They are all in the format ttXXXXXXXlibrary(httr) library(rjson) fromJSON(text_content(GET("http://www.imdbapi.com/?i=tt0110074&tomatoes=TRUE"))) This will be a bit easier in the next version of httr content(GET("http://www.imdbapi.com/?i=tt0110074&tomatoes=TRUE")), type = "application/json") See also https://github.com/hadley/data-movies, which I suspect is a faster approach than using an API. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/