Not sure about getting the file names, but you are 'extending' the
data structure on each iteration, which is inefficient; try 'lapply'
instead:
small.data <- do.call(rbind, lapply(mysites, function(.file){
try(base <- read.table(.file, sep=";", header=T, as.is=T,
fileEncoding="windows-1252"), TRUE)
}))
On Fri, Dec 17, 2010 at 10:15 AM, Daniel <dmsilv at gmail.com>
wrote:> Hello all,
> Is there any way to get each file from a website list and aggregate in a
> data frame?
> Otherwise I have to type 23 thousand web address into a long script like
it:
>
> base1 <- read.table("site 1", sep=";", header=T,
> fileEncoding="windows-1252")
> base2 <- read.table("site 2", sep=";", header=T,
> fileEncoding="windows-1252")
>
> I need to download each .CSV file from each address in the list vector and
> ?row bind all them into a big data frame.
> Also I need to decode each object to UTF-8. Of course, many of web sites
> from the list maybe be empty, so, my loop needs to jump for the next
> address.
>
> My first shot look looks like working, but, after one night and half a dia,
> it didn't finish. That I mean, much time for the task. Can somebody
help
> me?
>
> Example, few address:
>
> mysites <-c("
>
http://spce2010.tse.gov.br/spceweb.consulta.receitasdespesas2010/exportaReceitaCsvCandidato.action?sqCandidato=40000000613&sgUe=AM&cpfCnpjDoador=",
> "
>
http://spce2010.tse.gov.br/spceweb.consulta.receitasdespesas2010/exportaReceitaCsvCandidato.action?sqCandidato=40000000620&sgUe=AM&cpfCnpjDoador=",
> "
>
http://spce2010.tse.gov.br/spceweb.consulta.receitasdespesas2010/exportaReceitaCsvCandidato.action?sqCandidato=40000000259&sgUe=AM&cpfCnpjDoador=",
> "
>
http://spce2010.tse.gov.br/spceweb.consulta.receitasdespesas2010/exportaReceitaCsvCandidato.action?sqCandidato=250000002241&sgUe=SP&cpfCnpjDoador=",
> "
>
http://spce2010.tse.gov.br/spceweb.consulta.receitasdespesas2010/exportaReceitaCsvCandidato.action?sqCandidato=250000002438&sgUe=SP&cpfCnpjDoador>
", "
>
http://spce2010.tse.gov.br/spceweb.consulta.receitasdespesas2010/exportaReceitaCsvCandidato.action?sqCandidato=40000000257&sgUe=AM&cpfCnpjDoador>
","
>
http://spce2010.tse.gov.br/spceweb.consulta.receitasdespesas2010/exportaReceitaCsvCandidato.action?sqCandidato=120000000162&sgUe=MS&cpfCnpjDoador="
> )
>
> big.data <- NULL
> base <-NULL
> ?for (i in mysites) {
> ?try(base <- read.table(i, sep=";", header=T, as.is=T,
> fileEncoding="windows-1252"), TRUE)
> ?if(!is.null(base)) big.data <- rbind(big.data, base)
> ?}
>
> --
> Daniel Marcelino
> Skype: dmsilv
> http://marcelino.pbworks.com/
>
> ? ? ? ?[[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
--
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?