Rui,
Many thanks for your reply and coding, I was not
expecting so much work was required. It worked perfectly.
The only thing I needed to do, was create a Temp file in the Documents folder.
Thanks again,
Bob
At 03:52 PM 7/26/2023, Rui Barradas wrote:>??s 23:06 de 25/07/2023, Bob Green escreveu:
>>Hello,
>>I am seeking advice as to how I can download
>>the 833 files from this
site:"http://home.brisnet.org.au/~bgreen/Data/"
>>I want to be able to download them to perform a textual analysis.
>>If the 833 files, which are in a Directory with
>>two subfolders were on my computer I could read
>>them through readtext. Using readtext I get the error:
>> > x =
readtext("http://home.brisnet.org.au/~bgreen/Data/*")
>>Error in download_remote(file, ignore_missing, cache, verbosity) :
>> ? Remote URL does not end in known
>> extension. Please download the file manually.
>> > x =
readtext("http://home.brisnet.org.au/~bgreen/Data/Dir/()")
>>Error in download_remote(file, ignore_missing, cache, verbosity) :
>> ? Remote URL does not end in known
>> extension. Please download the file manually.
>>Any suggestions are appreciated.
>>Bob
>>______________________________________________
>>R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
>>https://stat.ethz.ch/mailman/listinfo/r-help
>>PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
>>and provide commented, minimal, self-contained, reproducible code.
>Hello,
>
>The following code downloads all files in the posted link.
>
>
>
>suppressPackageStartupMessages({
> library(rvest)
>})
>
># destination directory, change this at will
>dest_dir <- "~/Temp"
>
># first get the two subfolders from the Data webpage
>link <- "http://home.brisnet.org.au/~bgreen/Data/"
>page <- read_html(link)
>page %>%
> html_elements("a") %>%
> html_text() %>%
> grep("/$", ., value = TRUE) -> sub_folder
>
># create relevant disk sub-directories, if
># they do not exist yet
>for(subf in sub_folder) {
> d <- file.path(dest_dir, subf)
> if(!dir.exists(d)) {
> success <- dir.create(d)
> msg <- paste("created directory", d, "-",
success)
> message(msg)
> }
>}
>
># prepare to download the files
>dest_dir <- file.path(dest_dir, sub_folder)
>source_url <- paste0(link, sub_folder)
>
>success <- mapply(\(src, dest) {
> # read each Data subfolder
> # and get the file names therein
> # then lapply 'download.file' to each filename
> pg <- read_html(src)
> pg %>%
> html_elements("a") %>%
> html_text() %>%
> grep("\\.txt$", ., value = TRUE) %>%
> lapply(\(x) {
> s <- paste0(src, x)
> d <- file.path(dest, x)
> tryCatch(
> download.file(url = s, destfile = d),
> warning = function(w) w,
> error = function(e) e
> )
> })
>}, source_url, dest_dir)
>
>lengths(success)
># http://home.brisnet.org.au/~bgreen/Data/Hanson1/
># 84
># http://home.brisnet.org.au/~bgreen/Data/Hanson2/
># 749
>
># matches the question's number
>sum(lengths(success))
># [1] 833
>
>
>
>Hope this helps,
>
>Rui Barradas