Use basename(filename) to remove the lead parts of the full path to the file. E.g., replace FNs <- sort(match(sub("\\.PDF", "", file.names), month.name)) with (the untested) FNs <- sort(match(sub("\\.PDF", "", basename(file.names)), month.name)) Bill Dunlap TIBCO Software wdunlap tibco.com On Tue, Oct 9, 2018 at 1:38 PM, Ek Esawi <esawiek at gmail.com> wrote:> Hi again, > > I worked with RUi's idea of using the match function with month.name. > I got numerical values for months then i sorted and pasted the PDF > file extension. It gave me the file order i wanted, but now statements > 8,9,&10 don't work and i kept getting an error which is listed below. > The dilemma is if i add full.names=TRUE in statement 6 then statements > 9 and 10 don't produce what they did earlier. If i put > full.names=FALSE, then i am back to square 1. > Any idea is greatly appreciated.: > > The code > > 1. nstall.packages("tabulizer") > 2. installed.packages("stringr") > 3. library(stringr) > 4. library(tabulizer) > 5. path = "C:/Users/namei/Documents/TextMining/S2017" > 6. file.names <- dir(path, pattern =".PDF",full.names = TRUE) > 7. file.names <- str_remove(file.names,"\\s[0-9][0-9]") > 8. FNs <- sort(match(sub("\\.PDF", "", file.names), month.name)) > 9. FNs1 <- paste0(month.name[FNs],".","PDF") > 10 A <- lapply(FNs1, function(i) extract_tables(i)) > > Output and the error message. > > path = "C:/Users/eesawi/Documents/TextMining/S2017" > > file.names <- dir(path, pattern =".PDF",full.names = TRUE) > > file.names <- str_remove(file.names,"\\s[0-9][0-9]") > > FNs <- sort(match(sub("\\.PDF", "", file.names), month.name)) > > FNs1 <- paste0(month.name[FNs],".","PDF") > > A <- lapply(FNs1, function(i) extract_tables(i)) > Show Traceback > > Error in normalizePath(path.expand(path), winslash, mustWork) : > path[1]=".PDF": The system cannot find the file specified > On Tue, Oct 9, 2018 at 9:44 AM Ek Esawi <esawiek at gmail.com> wrote: > > > > Hi All-- > > > > I used base R list.file function to read files from a directory. The > > file names are months (April, August, etc). That's the system reads > > them in alphabetical order., but i want to reordered them in calendar > > order (January, February, ...December).. I thought i might be able to > > do it via RegEx or possibly gtools package, I am wondering if there is > > an easier way. > > > > Thanks--EK > > > > Example > > path = "C:/Users/name/Downloads/MyFiles" > > file.names <- dir(path, pattern =".PDF") > > > > Example output > > Output: > > "February.PDF" "January.PDF" "March.PDF" > > Desired output > > "January.PDF" "February.PDF" "March.PDF" > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Thank you Bill and RUI. I use month.name with sort and basename, as suggested by Bill. i got the sorted numerical values, then i use month.name to get proper ordered month names. The problem is that i have to paste to the names the extension PDF giving me the correct ordered file names, but then i get the same error message which suggest that the code is not reading the files properly I have not tried RUI's yet, but i will if nothing else works out. Thanks again--EK had to strip off file.names from the extension PDF, but when i paste the month.name with .PDF to get the correct file names, i am getting the same error. On Tue, Oct 9, 2018 at 4:47 PM William Dunlap <wdunlap at tibco.com> wrote:> > Use basename(filename) to remove the lead parts of the full path to the file. E.g., replace > FNs <- sort(match(sub("\\.PDF", "", file.names), month.name)) > with (the untested) > FNs <- sort(match(sub("\\.PDF", "", basename(file.names)), month.name)) > > Bill Dunlap > TIBCO Software > wdunlap tibco.com > > On Tue, Oct 9, 2018 at 1:38 PM, Ek Esawi <esawiek at gmail.com> wrote: >> >> Hi again, >> >> I worked with RUi's idea of using the match function with month.name. >> I got numerical values for months then i sorted and pasted the PDF >> file extension. It gave me the file order i wanted, but now statements >> 8,9,&10 don't work and i kept getting an error which is listed below. >> The dilemma is if i add full.names=TRUE in statement 6 then statements >> 9 and 10 don't produce what they did earlier. If i put >> full.names=FALSE, then i am back to square 1. >> Any idea is greatly appreciated.: >> >> The code >> >> 1. nstall.packages("tabulizer") >> 2. installed.packages("stringr") >> 3. library(stringr) >> 4. library(tabulizer) >> 5. path = "C:/Users/namei/Documents/TextMining/S2017" >> 6. file.names <- dir(path, pattern =".PDF",full.names = TRUE) >> 7. file.names <- str_remove(file.names,"\\s[0-9][0-9]") >> 8. FNs <- sort(match(sub("\\.PDF", "", file.names), month.name)) >> 9. FNs1 <- paste0(month.name[FNs],".","PDF") >> 10 A <- lapply(FNs1, function(i) extract_tables(i)) >> >> Output and the error message. >> >> path = "C:/Users/eesawi/Documents/TextMining/S2017" >> > file.names <- dir(path, pattern =".PDF",full.names = TRUE) >> > file.names <- str_remove(file.names,"\\s[0-9][0-9]") >> > FNs <- sort(match(sub("\\.PDF", "", file.names), month.name)) >> > FNs1 <- paste0(month.name[FNs],".","PDF") >> > A <- lapply(FNs1, function(i) extract_tables(i)) >> Show Traceback >> >> Error in normalizePath(path.expand(path), winslash, mustWork) : >> path[1]=".PDF": The system cannot find the file specified >> On Tue, Oct 9, 2018 at 9:44 AM Ek Esawi <esawiek at gmail.com> wrote: >> > >> > Hi All-- >> > >> > I used base R list.file function to read files from a directory. The >> > file names are months (April, August, etc). That's the system reads >> > them in alphabetical order., but i want to reordered them in calendar >> > order (January, February, ...December).. I thought i might be able to >> > do it via RegEx or possibly gtools package, I am wondering if there is >> > an easier way. >> > >> > Thanks--EK >> > >> > Example >> > path = "C:/Users/name/Downloads/MyFiles" >> > file.names <- dir(path, pattern =".PDF") >> > >> > Example output >> > Output: >> > "February.PDF" "January.PDF" "March.PDF" >> > Desired output >> > "January.PDF" "February.PDF" "March.PDF" >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > >
On 10/10/2018 7:23 PM, Ek Esawi wrote:> Thank you Bill and RUI. I use month.name with sort and basename, as > suggested by Bill. i got the sorted numerical values, then i use > month.name to get proper ordered month names. The problem is that i > have to paste to the names the extension PDF giving me the correct > ordered file names, but then i get the same error message which > suggest that the code is not reading the files properlyYou shouldn't need to do any pasting. Extract the months, use the order() function to find their proper order, then apply that vector to the original vector of filenames. Duncan Murdoch> > I have not tried RUI's yet, but i will if nothing else works out. > > Thanks again--EK > > had to strip off file.names from the extension PDF, but when i paste > the month.name with .PDF to get the correct file names, i am getting > the same error. > On Tue, Oct 9, 2018 at 4:47 PM William Dunlap <wdunlap at tibco.com> wrote: >> >> Use basename(filename) to remove the lead parts of the full path to the file. E.g., replace >> FNs <- sort(match(sub("\\.PDF", "", file.names), month.name)) >> with (the untested) >> FNs <- sort(match(sub("\\.PDF", "", basename(file.names)), month.name)) >> >> Bill Dunlap >> TIBCO Software >> wdunlap tibco.com >> >> On Tue, Oct 9, 2018 at 1:38 PM, Ek Esawi <esawiek at gmail.com> wrote: >>> >>> Hi again, >>> >>> I worked with RUi's idea of using the match function with month.name. >>> I got numerical values for months then i sorted and pasted the PDF >>> file extension. It gave me the file order i wanted, but now statements >>> 8,9,&10 don't work and i kept getting an error which is listed below. >>> The dilemma is if i add full.names=TRUE in statement 6 then statements >>> 9 and 10 don't produce what they did earlier. If i put >>> full.names=FALSE, then i am back to square 1. >>> Any idea is greatly appreciated.: >>> >>> The code >>> >>> 1. nstall.packages("tabulizer") >>> 2. installed.packages("stringr") >>> 3. library(stringr) >>> 4. library(tabulizer) >>> 5. path = "C:/Users/namei/Documents/TextMining/S2017" >>> 6. file.names <- dir(path, pattern =".PDF",full.names = TRUE) >>> 7. file.names <- str_remove(file.names,"\\s[0-9][0-9]") >>> 8. FNs <- sort(match(sub("\\.PDF", "", file.names), month.name)) >>> 9. FNs1 <- paste0(month.name[FNs],".","PDF") >>> 10 A <- lapply(FNs1, function(i) extract_tables(i)) >>> >>> Output and the error message. >>> >>> path = "C:/Users/eesawi/Documents/TextMining/S2017" >>>> file.names <- dir(path, pattern =".PDF",full.names = TRUE) >>>> file.names <- str_remove(file.names,"\\s[0-9][0-9]") >>>> FNs <- sort(match(sub("\\.PDF", "", file.names), month.name)) >>>> FNs1 <- paste0(month.name[FNs],".","PDF") >>>> A <- lapply(FNs1, function(i) extract_tables(i)) >>> Show Traceback >>> >>> Error in normalizePath(path.expand(path), winslash, mustWork) : >>> path[1]=".PDF": The system cannot find the file specified >>> On Tue, Oct 9, 2018 at 9:44 AM Ek Esawi <esawiek at gmail.com> wrote: >>>> >>>> Hi All-- >>>> >>>> I used base R list.file function to read files from a directory. The >>>> file names are months (April, August, etc). That's the system reads >>>> them in alphabetical order., but i want to reordered them in calendar >>>> order (January, February, ...December).. I thought i might be able to >>>> do it via RegEx or possibly gtools package, I am wondering if there is >>>> an easier way. >>>> >>>> Thanks--EK >>>> >>>> Example >>>> path = "C:/Users/name/Downloads/MyFiles" >>>> file.names <- dir(path, pattern =".PDF") >>>> >>>> Example output >>>> Output: >>>> "February.PDF" "January.PDF" "March.PDF" >>>> Desired output >>>> "January.PDF" "February.PDF" "March.PDF" >>> >>> ______________________________________________ >>> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
You can paste the directory names, dir.names(files), back on, with file.path(), after you do the sorting. A better idiom is to use order() instead of sort() and usng order's output to subscript file.names. E.g., the following sorts by year and month number.> file.names <- c("C:/tmp/June_2018.PDF", "C:/tmp/May_2018.PDF","C:/tmp/October_2016.PDF")> bfile.names <- sub("\\..*$", "", basename(file.names)) > bfile.names[1] "June_2018" "May_2018" "October_2016"> month <- sub("^([[:alpha:]]+)_.*$", "\\1", bfile.names) > month[1] "June" "May" "October"> month.namesError: object 'month.names' not found> month.names <-c("January","February","March","April","May","June","July","August","September","October","November","December")> month.number <- match(month, month.names) > month.number[1] 6 5 10> file.names[ order(year, month.number) ][1] "C:/tmp/October_2016.PDF" "C:/tmp/May_2018.PDF" "C:/tmp/June_2018.PDF" Bill Dunlap TIBCO Software wdunlap tibco.com On Wed, Oct 10, 2018 at 4:23 PM, Ek Esawi <esawiek at gmail.com> wrote:> Thank you Bill and RUI. I use month.name with sort and basename, as > suggested by Bill. i got the sorted numerical values, then i use > month.name to get proper ordered month names. The problem is that i > have to paste to the names the extension PDF giving me the correct > ordered file names, but then i get the same error message which > suggest that the code is not reading the files properly > > I have not tried RUI's yet, but i will if nothing else works out. > > Thanks again--EK > > had to strip off file.names from the extension PDF, but when i paste > the month.name with .PDF to get the correct file names, i am getting > the same error. > On Tue, Oct 9, 2018 at 4:47 PM William Dunlap <wdunlap at tibco.com> wrote: > > > > Use basename(filename) to remove the lead parts of the full path to the > file. E.g., replace > > FNs <- sort(match(sub("\\.PDF", "", file.names), month.name)) > > with (the untested) > > FNs <- sort(match(sub("\\.PDF", "", basename(file.names)), > month.name)) > > > > Bill Dunlap > > TIBCO Software > > wdunlap tibco.com > > > > On Tue, Oct 9, 2018 at 1:38 PM, Ek Esawi <esawiek at gmail.com> wrote: > >> > >> Hi again, > >> > >> I worked with RUi's idea of using the match function with month.name. > >> I got numerical values for months then i sorted and pasted the PDF > >> file extension. It gave me the file order i wanted, but now statements > >> 8,9,&10 don't work and i kept getting an error which is listed below. > >> The dilemma is if i add full.names=TRUE in statement 6 then statements > >> 9 and 10 don't produce what they did earlier. If i put > >> full.names=FALSE, then i am back to square 1. > >> Any idea is greatly appreciated.: > >> > >> The code > >> > >> 1. nstall.packages("tabulizer") > >> 2. installed.packages("stringr") > >> 3. library(stringr) > >> 4. library(tabulizer) > >> 5. path = "C:/Users/namei/Documents/TextMining/S2017" > >> 6. file.names <- dir(path, pattern =".PDF",full.names = TRUE) > >> 7. file.names <- str_remove(file.names,"\\s[0-9][0-9]") > >> 8. FNs <- sort(match(sub("\\.PDF", "", file.names), month.name)) > >> 9. FNs1 <- paste0(month.name[FNs],".","PDF") > >> 10 A <- lapply(FNs1, function(i) extract_tables(i)) > >> > >> Output and the error message. > >> > >> path = "C:/Users/eesawi/Documents/TextMining/S2017" > >> > file.names <- dir(path, pattern =".PDF",full.names = TRUE) > >> > file.names <- str_remove(file.names,"\\s[0-9][0-9]") > >> > FNs <- sort(match(sub("\\.PDF", "", file.names), month.name)) > >> > FNs1 <- paste0(month.name[FNs],".","PDF") > >> > A <- lapply(FNs1, function(i) extract_tables(i)) > >> Show Traceback > >> > >> Error in normalizePath(path.expand(path), winslash, mustWork) : > >> path[1]=".PDF": The system cannot find the file specified > >> On Tue, Oct 9, 2018 at 9:44 AM Ek Esawi <esawiek at gmail.com> wrote: > >> > > >> > Hi All-- > >> > > >> > I used base R list.file function to read files from a directory. The > >> > file names are months (April, August, etc). That's the system reads > >> > them in alphabetical order., but i want to reordered them in calendar > >> > order (January, February, ...December).. I thought i might be able to > >> > do it via RegEx or possibly gtools package, I am wondering if there is > >> > an easier way. > >> > > >> > Thanks--EK > >> > > >> > Example > >> > path = "C:/Users/name/Downloads/MyFiles" > >> > file.names <- dir(path, pattern =".PDF") > >> > > >> > Example output > >> > Output: > >> > "February.PDF" "January.PDF" "March.PDF" > >> > Desired output > >> > "January.PDF" "February.PDF" "March.PDF" > >> > >> ______________________________________________ > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > >[[alternative HTML version deleted]]