Hi,
I?m trying to query the Github API, and I?m running into some data munging
issues, so I was hoping someone on the list might advise.
Here?s my code. To run it you need to replace client_id and client_secret with
your own authorization information for Github.
library(github)
library(RCurl)
library(httpuv)
library(jsonlite)
# Set up the query
ctx = interactive.login(?client_id?, ?client_secret?)
pull <- function(i){
? get.pull.request.files(owner = ?rails?, repo = ?rails?, id = i, ctx =
get.github.context(), per_page=1000)
}
data <-
read.csv(getURL(?https://gist.githubusercontent.com/aronlindberg/a3d135a303664046c94a/raw/e42a0734ec4542eccf5f4d5bdeed5afbdd1720e9/pull_ids?),
sep = ?\n?)
list <- read.csv(textConnection(data), header = FALSE)
pull_lists <- lapply(list$V1, pull)
get_files <- function(pull_lists){
? sapply(pull_lists$content, ?[[?, ?filename? )
}
file_lists <- lapply(pull_lists, get_files)
Everything works fine until the last command, which generates:
Error in FUN(X[[1L]], ...) : subscript out of bounds
I?ve read here:
http://stackoverflow.com/questions/18461499/subscript-out-of-bounds-on-character-vector
which leads me to believe that the reason for the error is that when I run
file_lists <- lapply(pull_lists, get_files) some of the entries are missing.
However, I cannot figure out how to clean up the data. I have tried something
along the lines of:
clean_files <- function(pull_lists){
? pull_lists$content[which(nchar(pull_lists$content)==NULL)]<-NA
}
clean_lists <- lapply(pull_lists, clean_files)
But that simply replaces *every* value with NA (similarly if I change ==NULL to
<1, or <2).
How can I make this code work?
Best,
Aron
--?
Aron Lindberg
Doctoral Candidate,?Information Systems
Weatherhead School of Management?
Case Western Reserve University
aronlindberg.github.io
[[alternative HTML version deleted]]