Dear all,
I am new to R in general and ways to retrieve XML or JSON data in
particular. I have tried to get information through the XML package
and various websites without being able to do exactly what I want. I
hope someone of you can give me some help.
I want to retrieve information about movies from IMDB or rather the
unofficial api, www.imdbapi.com. I have a vector with a lot movie-ids
according to IMDB standard. To give just a few:
ids <-c("tt0110074", "tt0096184", "tt0081568",
"tt0448134", "tt0079367")
Now, I want to create a data frame where each of the movies refer to
one line and the other information is retrieved from the api. This can
be retrieved either as XML data or JSON data, e.g.
JSON:
http://www.imdbapi.com/?i=tt0110074&tomatoes=TRUE
XML:
http://www.imdbapi.com/?i=tt0110074&r=XML&tomatoes=TRUE
Where i refer to the movie-id, i.e. the information I have in my
vector. They are all in the format ttXXXXXXX
I have tried to use the XML package, but I have not been able to get
the data into a workable data frame. I take for granted that this is
due to my limited knowledge about R and I aim to learn more, but right
now I am in a bit of a hurry since this unofficial api will be taken
of the web in a few days, and I really want to crete this data frame
for further analysis.
Any help would be greatly appreciated.
All the best,
Richard O
You can try this one...
#####
library(RCurl)
library(rjson)
ids <- c("tt0110074", "tt0096184", "tt0081568",
"tt0448134", "tt0079367")
titles <- data.frame()
for ( i in 1:length(ids)) {
req <- paste("http://www.imdbapi.com/?i=", ids[i] ,
"&tomatoes=TRUE",
sep="")
u <- getURL(req)
j <- fromJSON(u)
titles <- rbind(titles, as.data.frame(j))
}
#####
I am sure it can be done more efficient...
2012/7/26 Richard Ohrvall <richard.ohrvall@gmail.com>
> Dear all,
>
> I am new to R in general and ways to retrieve XML or JSON data in
> particular. I have tried to get information through the XML package
> and various websites without being able to do exactly what I want. I
> hope someone of you can give me some help.
>
> I want to retrieve information about movies from IMDB or rather the
> unofficial api, www.imdbapi.com. I have a vector with a lot movie-ids
> according to IMDB standard. To give just a few:
>
> ids <-c("tt0110074", "tt0096184",
"tt0081568", "tt0448134", "tt0079367")
>
> Now, I want to create a data frame where each of the movies refer to
> one line and the other information is retrieved from the api. This can
> be retrieved either as XML data or JSON data, e.g.
>
> JSON:
> http://www.imdbapi.com/?i=tt0110074&tomatoes=TRUE
>
> XML:
> http://www.imdbapi.com/?i=tt0110074&r=XML&tomatoes=TRUE
>
> Where i refer to the movie-id, i.e. the information I have in my
> vector. They are all in the format ttXXXXXXX
>
> I have tried to use the XML package, but I have not been able to get
> the data into a workable data frame. I take for granted that this is
> due to my limited knowledge about R and I aim to learn more, but right
> now I am in a bit of a hurry since this unofficial api will be taken
> of the web in a few days, and I really want to crete this data frame
> for further analysis.
>
> Any help would be greatly appreciated.
>
> All the best,
> Richard O
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
[[alternative HTML version deleted]]
On Thu, Jul 26, 2012 at 4:18 AM, Richard Ohrvall <richard.ohrvall at gmail.com> wrote:> Dear all, > > I am new to R in general and ways to retrieve XML or JSON data in > particular. I have tried to get information through the XML package > and various websites without being able to do exactly what I want. I > hope someone of you can give me some help. > > I want to retrieve information about movies from IMDB or rather the > unofficial api, www.imdbapi.com. I have a vector with a lot movie-ids > according to IMDB standard. To give just a few: > > ids <-c("tt0110074", "tt0096184", "tt0081568", "tt0448134", "tt0079367") > > Now, I want to create a data frame where each of the movies refer to > one line and the other information is retrieved from the api. This can > be retrieved either as XML data or JSON data, e.g. > > JSON: > http://www.imdbapi.com/?i=tt0110074&tomatoes=TRUE > > XML: > http://www.imdbapi.com/?i=tt0110074&r=XML&tomatoes=TRUE > > Where i refer to the movie-id, i.e. the information I have in my > vector. They are all in the format ttXXXXXXXlibrary(httr) library(rjson) fromJSON(text_content(GET("http://www.imdbapi.com/?i=tt0110074&tomatoes=TRUE"))) This will be a bit easier in the next version of httr content(GET("http://www.imdbapi.com/?i=tt0110074&tomatoes=TRUE")), type = "application/json") See also https://github.com/hadley/data-movies, which I suspect is a faster approach than using an API. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/