sir after that I want to run: #get the list of sample names GSMnames <- t(list.files("~/Desktop/GSE162562_RAW", full.names = F)) #remove .txt from file/sample names GSMnames <- gsub(pattern = ".txt", replacement = "", GSMnames) #make a vector of the list of files to aggregate files <- list.files("~/Desktop/GSE162562_RAW", full.names = TRUE) but it is not running as after running utils::untar(FILE, exdir dirname(FILE)) it creates another 108 archieves On Tue, Aug 24, 2021 at 2:03 AM Andrew Simmons <akwsimmo at gmail.com> wrote:> Hello, > > > I tried downloading that file using 'utils::download.file' (which worked), > but then continued to complain about "damaged archive" when trying to use > 'utils::untar'. However, it seemed to work when I downloaded the archive > manually. Finally, the solution I found is that you have to specify the > mode in which you're downloading the file. Something like: > > > URL <- " > https://ftp.ncbi.nlm.nih.gov/geo/series/GSE162nnn/GSE162562/suppl/GSE162562_RAW.tar > " > FILE <- file.path(tempdir(), basename(URL)) > > > utils::download.file(URL, FILE, mode = "wb") > utils::untar(FILE, exdir = dirname(FILE)) > > > worked perfectly for me. It seems to also work still on Ubuntu, but you > can let us know if you find it doesn't. I hope this helps! > > > > On Mon, Aug 23, 2021 at 3:20 PM Anas Jamshed <anasjamshed1994 at gmail.com> > wrote: > >> I am trying this URL: " >> https://ftp.ncbi.nlm.nih.gov/geo/series/GSE162nnn/GSE162562/suppl/GSE162562_RAW.tar >> " >> >> but it is not giving me any file >> >> On Mon, Aug 23, 2021 at 11:42 PM Andrew Simmons <akwsimmo at gmail.com> >> wrote: >> >>> Hello, >>> >>> >>> I don't think you need to use a system command directly, I think >>> 'utils::untar' is all you need. I tried the same thing myself, something >>> like: >>> >>> >>> URL <- "https://exiftool.org/Image-ExifTool-12.30.tar.gz" >>> FILE <- file.path(tempdir(), basename(URL)) >>> >>> >>> utils::download.file(URL, FILE) >>> utils::untar(FILE, exdir = dirname(FILE)) >>> >>> >>> and it makes a folder "Image-ExifTool-12.30". It seems to work perfectly >>> fine in Windows 10 x64 build 19042. Can you send the specific file (or >>> provide a URL to the specific file) that isn't working for you? >>> >>> On Mon, Aug 23, 2021 at 12:53 PM Anas Jamshed <anasjamshed1994 at gmail.com> >>> wrote: >>> >>>> I have the file GSE162562_RAW. First I untar them >>>> by untar("GSE162562_RAW.tar") >>>> then I am running like: >>>> system("gunzip ~/Desktop/GSE162562_RAW/*.gz") >>>> >>>> >>>> This is running fine in Linux but not in windows. What changes I >>>> should make to run this command in windows as well >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ______________________________________________ >>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>>> >>>[[alternative HTML version deleted]]
Hello, I see what you're saying that the .tar archive contains many more compressed files, but that's not necessarily a problem. R can read directly from a compressed file without having to decompress it beforehand. I modified my code to look a little more like yours: # need to do 'path.expand' or 'untar' will fail # this is where we put the downloaded files exdir <- path.expand("~/GSE162562_RAW") dir.create(exdir, showWarnings = FALSE) URL <- " https://ftp.ncbi.nlm.nih.gov/geo/series/GSE162nnn/GSE162562/suppl/GSE162562_RAW.tar " FILE <- file.path(tempdir(), basename(URL)) utils::download.file(URL, FILE, mode = "wb") utils::untar(FILE, exdir = exdir) unlink(FILE, recursive = TRUE, force = TRUE) # 'files' is the full path to the downloaded files # attribute 'names' is the basename with '.txt.gz' removed from the end files <- list.files(exdir, full.names = TRUE) names(files) <- sub("\\.txt\\.gz$", "", basename(files)) # R can open compressed files without decompressing beforehand print(utils::read.table(files[[1]], sep = "\t")) print(utils::read.delim(files[[2]], header = FALSE)) Does this work better than before for you? On Mon, Aug 23, 2021 at 8:16 PM Anas Jamshed <anasjamshed1994 at gmail.com> wrote:> sir after that I want to run: > #get the list of sample names > GSMnames <- t(list.files("~/Desktop/GSE162562_RAW", full.names = F)) > > #remove .txt from file/sample names > GSMnames <- gsub(pattern = ".txt", replacement = "", GSMnames) > > #make a vector of the list of files to aggregate > files <- list.files("~/Desktop/GSE162562_RAW", full.names = TRUE) > > > but it is not running as after running utils::untar(FILE, exdir > dirname(FILE)) it creates another 108 archieves > > On Tue, Aug 24, 2021 at 2:03 AM Andrew Simmons <akwsimmo at gmail.com> wrote: > >> Hello, >> >> >> I tried downloading that file using 'utils::download.file' (which >> worked), but then continued to complain about "damaged archive" when trying >> to use 'utils::untar'. However, it seemed to work when I downloaded the >> archive manually. Finally, the solution I found is that you have to specify >> the mode in which you're downloading the file. Something like: >> >> >> URL <- " >> https://ftp.ncbi.nlm.nih.gov/geo/series/GSE162nnn/GSE162562/suppl/GSE162562_RAW.tar >> " >> FILE <- file.path(tempdir(), basename(URL)) >> >> >> utils::download.file(URL, FILE, mode = "wb") >> utils::untar(FILE, exdir = dirname(FILE)) >> >> >> worked perfectly for me. It seems to also work still on Ubuntu, but you >> can let us know if you find it doesn't. I hope this helps! >> >> >> >> On Mon, Aug 23, 2021 at 3:20 PM Anas Jamshed <anasjamshed1994 at gmail.com> >> wrote: >> >>> I am trying this URL: " >>> https://ftp.ncbi.nlm.nih.gov/geo/series/GSE162nnn/GSE162562/suppl/GSE162562_RAW.tar >>> " >>> >>> but it is not giving me any file >>> >>> On Mon, Aug 23, 2021 at 11:42 PM Andrew Simmons <akwsimmo at gmail.com> >>> wrote: >>> >>>> Hello, >>>> >>>> >>>> I don't think you need to use a system command directly, I think >>>> 'utils::untar' is all you need. I tried the same thing myself, something >>>> like: >>>> >>>> >>>> URL <- "https://exiftool.org/Image-ExifTool-12.30.tar.gz" >>>> FILE <- file.path(tempdir(), basename(URL)) >>>> >>>> >>>> utils::download.file(URL, FILE) >>>> utils::untar(FILE, exdir = dirname(FILE)) >>>> >>>> >>>> and it makes a folder "Image-ExifTool-12.30". It seems to work >>>> perfectly fine in Windows 10 x64 build 19042. Can you send the specific >>>> file (or provide a URL to the specific file) that isn't working for you? >>>> >>>> On Mon, Aug 23, 2021 at 12:53 PM Anas Jamshed < >>>> anasjamshed1994 at gmail.com> wrote: >>>> >>>>> I have the file GSE162562_RAW. First I untar them >>>>> by untar("GSE162562_RAW.tar") >>>>> then I am running like: >>>>> system("gunzip ~/Desktop/GSE162562_RAW/*.gz") >>>>> >>>>> >>>>> This is running fine in Linux but not in windows. What changes I >>>>> should make to run this command in windows as well >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>>[[alternative HTML version deleted]]
Hello, Are you looking for what follows Andrew's code below to download and untar the files? read_one_gz_file <- function(x, path){ fl <- file.path(path, x) tryCatch({ read.table(zz <- gzfile(fl)) }, warning = function(w) w, error = function(e) e ) } URL <- "https://ftp.ncbi.nlm.nih.gov/geo/series/GSE162nnn/GSE162562/suppl/GSE162562_RAW.tar" FILE <- file.path(tempdir(), basename(URL)) utils::download.file(URL, FILE, mode = "wb") utils::untar(FILE, exdir = dirname(FILE)) fls <- list.files(path = dirname(FILE), pattern = "\\.gz$") length(fls) #[1] 108 data_list <- lapply(fls, read_one_gz_file, path = dirname(FILE)) length(data_list) #[1] 108 head(data_list[[1]]) # V1 V2 #1 A1BG 4 #2 A1BG-AS1 52 #3 A1CF 12 #4 A2M 645 #5 A2M-AS1 113 #6 A2ML1 21 I don't understand what you mean by to aggregate the files but if you want them all in one df, maybe this will do it. sapply(data_list, ncol) # All files have 2 columns # create a column with the original dataset name data_list <- lapply(seq_along(data_list), function(i){ dftmp <- data_list[[i]] dftmp$dataset <- sub("\\.txt\\.gz$", "", fls[i]) dftmp }) # put all data sets in one data.frame df1 <- do.call(rbind, data_list) dim(df1) # Over 2.8 million rows, 3 columns head(df1) # see the first 6 rows # V1 V2 dataset #1 A1BG 4 GSM4954457_A_1_Asymptom #2 A1BG-AS1 52 GSM4954457_A_1_Asymptom #3 A1CF 12 GSM4954457_A_1_Asymptom #4 A2M 645 GSM4954457_A_1_Asymptom #5 A2M-AS1 113 GSM4954457_A_1_Asymptom #6 A2ML1 21 GSM4954457_A_1_Asymptom Hope this helps, Rui Barradas ?s 01:16 de 24/08/21, Anas Jamshed escreveu:> sir after that I want to run: > #get the list of sample names > GSMnames <- t(list.files("~/Desktop/GSE162562_RAW", full.names = F)) > > #remove .txt from file/sample names > GSMnames <- gsub(pattern = ".txt", replacement = "", GSMnames) > > #make a vector of the list of files to aggregate > files <- list.files("~/Desktop/GSE162562_RAW", full.names = TRUE) > > > but it is not running as after running utils::untar(FILE, exdir > dirname(FILE)) it creates another 108 archieves > > On Tue, Aug 24, 2021 at 2:03 AM Andrew Simmons <akwsimmo at gmail.com> wrote: > >> Hello, >> >> >> I tried downloading that file using 'utils::download.file' (which worked), >> but then continued to complain about "damaged archive" when trying to use >> 'utils::untar'. However, it seemed to work when I downloaded the archive >> manually. Finally, the solution I found is that you have to specify the >> mode in which you're downloading the file. Something like: >> >> >> URL <- " >> https://ftp.ncbi.nlm.nih.gov/geo/series/GSE162nnn/GSE162562/suppl/GSE162562_RAW.tar >> " >> FILE <- file.path(tempdir(), basename(URL)) >> >> >> utils::download.file(URL, FILE, mode = "wb") >> utils::untar(FILE, exdir = dirname(FILE)) >> >> >> worked perfectly for me. It seems to also work still on Ubuntu, but you >> can let us know if you find it doesn't. I hope this helps! >> >> >> >> On Mon, Aug 23, 2021 at 3:20 PM Anas Jamshed <anasjamshed1994 at gmail.com> >> wrote: >> >>> I am trying this URL: " >>> https://ftp.ncbi.nlm.nih.gov/geo/series/GSE162nnn/GSE162562/suppl/GSE162562_RAW.tar >>> " >>> >>> but it is not giving me any file >>> >>> On Mon, Aug 23, 2021 at 11:42 PM Andrew Simmons <akwsimmo at gmail.com> >>> wrote: >>> >>>> Hello, >>>> >>>> >>>> I don't think you need to use a system command directly, I think >>>> 'utils::untar' is all you need. I tried the same thing myself, something >>>> like: >>>> >>>> >>>> URL <- "https://exiftool.org/Image-ExifTool-12.30.tar.gz" >>>> FILE <- file.path(tempdir(), basename(URL)) >>>> >>>> >>>> utils::download.file(URL, FILE) >>>> utils::untar(FILE, exdir = dirname(FILE)) >>>> >>>> >>>> and it makes a folder "Image-ExifTool-12.30". It seems to work perfectly >>>> fine in Windows 10 x64 build 19042. Can you send the specific file (or >>>> provide a URL to the specific file) that isn't working for you? >>>> >>>> On Mon, Aug 23, 2021 at 12:53 PM Anas Jamshed <anasjamshed1994 at gmail.com> >>>> wrote: >>>> >>>>> I have the file GSE162562_RAW. First I untar them >>>>> by untar("GSE162562_RAW.tar") >>>>> then I am running like: >>>>> system("gunzip ~/Desktop/GSE162562_RAW/*.gz") >>>>> >>>>> >>>>> This is running fine in Linux but not in windows. What changes I >>>>> should make to run this command in windows as well >>>>> >>>>> [[alternative HTML version deleted]] >>>>> >>>>> ______________________________________________ >>>>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >