Ek Esawi
2018-Dec-20 03:22 UTC
[R] Combine recursive lists in a single list or data frame and write it to file
Thank you Jim. I did use unlist with the recursive option which converted the 3 levels list to a list of 38 matrices. I tried your earlier function to join the 38 matrices, all of which have different number of columns and rows, but i kept getting an error. fillList<-function(x) { + maxrows<-max(unlist(lapply(x,length))) + return(lapply(x,"[",1:maxrows)) + }> > for (i in 1:length(MyTables)) {+ write.table(as.data.frame(fillList(MyTables[i])), + file = "Temp.txt",append = TRUE,quote = TRUE)} Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 3, 55, 56, 53, 54, 16, 21, 23, 50, 24 On Wed, Dec 19, 2018 at 9:36 PM Jim Lemon <drjimlemon at gmail.com> wrote:> > Hi Ek, > Look at unlist and the argument "recursive". You can step down through > the levels or a nested list to convert it to a single level list. > > Jim > > On Thu, Dec 20, 2018 at 1:33 PM Ek Esawi <esawiek at gmail.com> wrote: > > > > Thank you Bert. I don't see how unlist will help. I want to combine > > them but keep the "rectangular structure",e.g. list, data frame, > > matrix because i want to get the tables in their original form. > > Unlist converts the whole output to a single vector; unless i am > > missing something. > > > > On Wed, Dec 19, 2018 at 9:10 PM Bert Gunter <bgunter.4567 at gmail.com> wrote: > > > > > > Does ?unlist not help? Why not? > > > > > > Bert > > > > > > > > > On Wed, Dec 19, 2018, 5:13 PM Ek Esawi <esawiek at gmail.com wrote: > > >> > > >> Hi All? > > >> > > >> I am using the R tabulizer package to extract tables from pdf files. > > >> The output is a set of lists of matrices. The package extracts tables > > >> and a lot of extra stuff which is nearly impossible to clean with > > >> RegEx. So, I want to clean it manually. > > >> To do so I need to (1) combine all lists in a single list or data > > >> frame and (2) then write the single entity to a text file to edit it. > > >> I could not figure out how. > > >> > > >> I tried something like this but did not work. > > >> lapply(MyTables, function(x) > > >> lapply(x,write.table(file="temp.txt",append = TRUE))) > > >> > > >> Any help is greatly appreciated. > > >> > > >> Here is my code: > > >> > > >> install.packages("rJava") ;library(rJava) > > >> install.packages("tabulizer");library(tabulizer) > > >> MyPath <- "C:/Users/name/Documents/tEMP" > > >> ExtTable <- function (Path,CalOrd){ > > >> FileNames <- dir(Path, pattern =".(pdf|PDF)",full.names = TRUE) > > >> MyFiles <- lapply(FileNames, function(i) extract_tables(i,method = "stream")) > > >> if(CalOrd == "Yes"){ > > >> MyOFiles <- gsub("(\\s.*)|(.pdf|.PDF)","",basename(FileNames)) > > >> MyOFiles <- match(MyOFiles,month.name) > > >> MyNFiles <- MyFiles[order(MyOFiles)]} > > >> else > > >> MyFiles > > >> } > > >> MyTables <- ExtTable(Path=MyPath,CalOrd = "No") > > >> > > >> Here is cleaned portion of the output: The whole output consists of 3 > > >> lists, each contains 12, 15, and 12 sub-lists. > > >> > > >> [[2]][[2]] > > >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > > >> [1,] "" "Avg." "+_ lo" "n" "Med." "" "Avg." "+_ > > >> lo" "n" "Med." > > >> [2,] "SiOz" "44.0" "1.26" "375" "44.1" "Nb" "4.8" "6.3" > > >> "58" "2.7" > > >> [3,] "T i O 2" "0.09" "0.09" "561" "0.09" "Mo(b)" "50" "30" > > >> "3" "35" > > >> [4,] "A1203" "2.27" "1.10" "375" "2.20" "Ru(b)" "12.4" "4.1" > > >> "3" "12" > > >> [5,] "FeO total" "8.43" "1.14" "375" "8.19" "Pd(b)" "3.9" "2.1" > > >> "19" "4.1" > > >> [6,] "MnO" "0.14" "0.03" "366" "0.14" "Ag(b)" "6.8" "8.3" > > >> "17" "4.8" > > >> [7,] "MgO" "41.4" "3.00" "375" "41.2" "Cd(b)" "41" "14" > > >> "16" "37" > > >> [8,] "CaO" "2.15" "1.11" "374" "2.20" "In(b)" "12" "4" > > >> "19" "12" > > >> [9,] "Na20" "0.24" "0.16" "341" "0.21" "Sn(b)" "54" "31" > > >> "6" "36" > > >> [10,] "K20" "0.054" "0.11" "330" "0.028" "Sb(b)" "3.9" "3.9" > > >> "11" "3.2" > > >> [11,] "P205" "0.056" "0.11" "233" "0.030" "Te(b)" "11" "4" > > >> "18" "10" > > >> [12,] "Total" "98.88" "" "" "98.43" "Cs(b)" "10" "16" > > >> "17" "1.5" > > >> [13,] "" "" "" "" "" "Ba" "33" "52" > > >> "75" "17" > > >> [14,] "Mg-value" "89.8" "1.1" "375" "90.0" "La" "2.60" "5.70" > > >> "208" "0.77" > > >> [15,] "Ca/AI" "1.28" "1.6" "374" "1.35" "Ce" "6.29" "11.7" > > >> "197" "2.08" > > >> [16,] "AI/Ti" "22" "29" "361" "22" "Pr" "0.56" "0.87" > > >> "40" "0.21" > > >> [17,] "F e / M n" "60" "10" "366" "59" "Nd" "2.67" "4.31" > > >> "162" "1.52" > > >> [18,] "" "" "" "" "" "Sm" "0.47" "0.69" > > >> "214" "0.25" > > >> [19,] "Li" "1.5" "0.3" "6" "1.5" "Eu" "0.16" "0.21" > > >> "201" "0.097" > > >> [20,] "B" "0.53" "0.07" "6" "0.55" "Gd" "0.60" "0.83" > > >> "67" "0.31" > > >> [21,] "C" "110" "50" "13" "93" "Tb" "0.070" > > >> "0.064" "146" "0.056" > > >> [22,] "F" "88" "71" "15" "100" "Dy" "0.51" "0.35" > > >> "58" "0.47" > > >> [23,] "S" "157" "77" "22" "152" "Ho" "0.12" "0.14" > > >> "54" "0.090" > > >> [24,] "C1" "53" "45" "15" "75" "Er" "0.30" "0.22" > > >> "52" "0.28" > > >> [25,] "Sc" "12.2" "6.4" "220" "12.0" "Tm" "0.038" > > >> "0.026" "40" "0.035" > > >> [26,] "V" "56" "21" "132" "53" "Yb" "0.26" "0.14" > > >> "201" "0.27" > > >> [27,] "Cr" "2690" "705" "325" "2690" "Lu" "0.043" > > >> "0.023" "172" "0.045" > > >> [28,] "Co" "112" "10" "166" "111" "Hf" "0.27" "0.30" > > >> "71" "0.17" > > >> [29,] "Ni" "2160" "304" "308" "2140" "Ta" "0.40" "0.51" > > >> "38" "0.23" > > >> [30,] "Cu" "11" "9" "94" "9" "W(b)" "7.2" "5.2" > > >> "6" "4.0" > > >> [31,] "Zn" "65" "20" "129" "60" "Re(b)" "0.13" "0.11" > > >> "18" "0.09" > > >> [32,] "Ga" "2.4" "1.3" "49" "2.4" "Os(b)" "4.0" "1.8" > > >> "18" "3.7" > > >> [33,] "Ge" "0.96" "0.19" "19" "0.92" "Ir(b)" "3.7" "0.9" > > >> "34" "3.0" > > >> [34,] "As" "0.11" "0.07" "7" "0.10" "Pt(b)" "7" "-" > > >> "1" "-" > > >> [35,] "Se" "0.041" "0.056" "18" "0.025" "Au(b)" "0.65" "0.53" > > >> "30" "0.5" > > >> [36,] "Br" "0.01" "0.01" "6" "0.01" "Tl(b)" "1.2" "1.0" > > >> "13" "0.9" > > >> [37,] "Rb" "1,9" "4.8" "97" "0.38" "Pb" "0.16" "0.11" > > >> "17" "0.16" > > >> [38,] "Sr" "49" "60" "110" "20" "Bi(b)" "1.7" "0.7" > > >> "13" "1.6" > > >> [39,] "Y" "4.4" "5.5" "86" "3.1" "Th*" "0.71" "1.2" > > >> "71" "0.22" > > >> [40,] "Zr" "21" "42" "82" "8.0" "U" "0.12" "0.23" > > >> "48" "0.040" > > >> [[2]][[4]] > > >> [,1] [,2] [,3] [,4] [,5] > > >> [,6] > > >> [1,] "" "Spinel peridotites" "" "Garnet peridotites" > > >> "" "Primitive" > > >> [2,] "" "Avg. Meal." "M-A sp" "M-A gt B-M" > > >> "Jordan" "mantle" > > >> [3,] "SiO 2" "44.0 44.1" "44.15" "44.99 45.00" > > >> "45.55" "44.8" > > >> [4,] "TiO 2" "0.09 0.09" "0.07" "0.06 0.08" > > >> "0.11" "0.21" > > >> [5,] "A1203" "2.27 2.20" "1.96" "1.40 1.31" > > >> "1.43" "4.45" > > >> [6,] "Cr203" "0.39 0.39" "0.44" "0.32 0.38" > > >> "0.34" "0.43" > > >> [7,] "FeOtotal" "8.43 8.19" "8.28" "7.89 6.97" > > >> "7.61" "8.40" > > >> [8,] "Mn O" "0.14 0.14" "0.12" "0.11 0.13" > > >> "0.11" "0.14" > > >> [9,] "MgO" "41.4 41.2" "42.25" "42.60 44.86" > > >> "43.55" "37.2" > > >> [10,] "NiO" "0.27 0.27" "0.27" "0.26 0.29" > > >> "-" "0.24" > > >> [11,] "CaO" "2.15 2.20" "2.08" "0.82 0.77" > > >> "1.05" "3.60" > > >> [12,] "Na 20" "0.24 0.21" "0.18" "0.11 0.09" > > >> "0.14" "0.34" > > >> [13,] "K 2 0" "0.054 0.028" "0.05" "0.04 0.10" > > >> "0.11" "0.028" > > >> [14,] "P205" "0.056 0.030" "0.02" "- 0.01" > > >> "-" "0.022" > > >> [15,] "Total" "99.49 99.05" "99.87" "98.60 100.00" > > >> "100.00" "99.86" > > >> [16,] "Mg-value" "89.8 90.0" "90.1" "90.6 92.0" > > >> "91.1" "88.8" > > >> [17,] "olivine" "62 63" "67" "65 68" > > >> "66" "56 57" > > >> [18,] "opx" "24 24" "22" "28 25" > > >> "28" "22 17" > > >> [19,] "cpx" "12 11" "9" "3 2" > > >> "3" "19 10" > > >> [20,] "spinel" "2 2" "2" "- -" > > >> "-" "3 -" > > >> > > >> Here is portion of the output for str(MyTables): > > >> > > >> str(MyTables) > > >> > > >> List of 3 > > >> $ :List of 12 > > >> $ : chr [1:3, 1:2] "south of the artificial lake Lokka. Intrusive > > >> complexes" "of alkaline rocks are found at Sokli (phosphorite-bear-" > > >> "ing and a possible Nb-occurrence) in Finland, and at" "(Eriksson, > > >> 1992). During this period, Northern Europe" ... > > >> ..$ : chr [1:55, 1:15] "Element" "Ag" "Al" "Al_XRF" ... > > >> ..$ : chr [1:56, 1:2] "in the till is mainly of local origin, > > >> although some cob-" "bles and boulders may have been transported over > > >> sev-" "eral kilometres. The moraine formations in the study" "area are > > >> mostly gravelly and sandy tills, locally hum-" ... > > >> ..$ : chr [1:53, 1:2] "requisites. PCA accounts for maximum variance > > >> of all" "variables, while FA is based on the correlation structure" > > >> "of the variables. The model of factor analysis allows that" "the > > >> common factors do not explain the total variation of" ... > > >> ..$ : chr [1:54, 1:7] "lished examples of the use of factor > > >> analysis, it is neglec-" "ted that regional geochemical (and > > >> environmental) data" "almost never follow a normal distribution. > > >> Continuing Method" "with factor analysis in such a case must lead to > > >> biased" ... > > >> ..$ : chr [1:16, 1:2] "shows the factor loadings of the different > > >> variables" "entering each factor. Names of variables with an abso-" > > >> "lute value of the loadings <0.3 are not plotted. Fig. 5" "shows 8 > > >> results of factor analyses using a selection of all" ... > > >> ..$ : chr [1:21, 1:2] "pretable results, notwithstanding the fact > > >> that on the" "basis of the foregoing discussion it should probably > > >> not" "be used with these data. Do these results warrant the use" "of a > > >> quite work-intensive method? Unfortunately not," ... > > >> ..$ : chr [1:55, 1:8] "" "Ag" "Al" "Al_XRF" ... > > >> ..$ : chr [1:23, 1:2] "addition, geochemical reasoning (e.g. > > >> geochemical asso-" "ciations and/or pathfinder elements for different > > >> types of" "ore deposits) was used to select further sub-sets of vari-" > > >> "ables. In geochemistry, the selection of elements entered" ... > > >> ..$ : chr [1:55, 1:2] "Fig. 10C cuts several geological units, and > > >> is most likely" "indicative of alteration processes related to a > > >> deep-" "seated fault. It was revealed again in a factor analysis" > > >> "carried out with all those elements extracted by aqua" ... > > >> ..$ : chr [1:50, 1:2] "well justified in stating that it is not very > > >> scientific to" "play with the selection of elements and number of > > >> fac-" "tors extracted until one > > >> ?\200\230?\200\230finds?\200\231?\200\231 an > > >> ?\200\230?\200\230interesting?\200\231?\200\231 result." "On the other > > >> hand, even all the different results pre-" ... > > >> ..$ : chr [1:24, 1:2] "Niemel??, J., Ekman, I., Lukashov, A. (Eds.), > > >> 1993. Quaternary" "Deposits of Finland and Northwestern Part of > > >> Russian Fed-" "eration and Their Resources 1:1,000,000. Geological > > >> Survey" "of Finland, Espoo, Finland." ... > > >> $ :List of 15 > > >> > > >> ______________________________________________ > > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > >> https://stat.ethz.ch/mailman/listinfo/r-help > > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > >> and provide commented, minimal, self-contained, reproducible code. > > > > ______________________________________________ > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code.
Jim Lemon
2018-Dec-20 05:28 UTC
[R] Combine recursive lists in a single list or data frame and write it to file
Hi Ek, It looks to me as though you are not joining the lists into a single list, then calling FillList and then converting to a data frame. If you can send some data (if it's not too big) I can test it and make sure that it works, as it did every time for me. Jim On Thu, Dec 20, 2018 at 2:22 PM Ek Esawi <esawiek at gmail.com> wrote:> > Thank you Jim. I did use unlist with the recursive option which > converted the 3 levels list to a list of 38 matrices. I tried your > earlier function to join the 38 matrices, all of which have different > number of columns and rows, but i kept getting an error. > > fillList<-function(x) { > + maxrows<-max(unlist(lapply(x,length))) > + return(lapply(x,"[",1:maxrows)) > + } > > > > for (i in 1:length(MyTables)) { > + write.table(as.data.frame(fillList(MyTables[i])), > + file = "Temp.txt",append = TRUE,quote = TRUE)} > Error in (function (..., row.names = NULL, check.rows = FALSE, > check.names = TRUE, : > arguments imply differing number of rows: 3, 55, 56, 53, 54, 16, 21, > 23, 50, 24 > > > On Wed, Dec 19, 2018 at 9:36 PM Jim Lemon <drjimlemon at gmail.com> wrote: > > > > Hi Ek, > > Look at unlist and the argument "recursive". You can step down through > > the levels or a nested list to convert it to a single level list. > > > > Jim > > > > On Thu, Dec 20, 2018 at 1:33 PM Ek Esawi <esawiek at gmail.com> wrote: > > > > > > Thank you Bert. I don't see how unlist will help. I want to combine > > > them but keep the "rectangular structure",e.g. list, data frame, > > > matrix because i want to get the tables in their original form. > > > Unlist converts the whole output to a single vector; unless i am > > > missing something. > > > > > > On Wed, Dec 19, 2018 at 9:10 PM Bert Gunter <bgunter.4567 at gmail.com> wrote: > > > > > > > > Does ?unlist not help? Why not? > > > > > > > > Bert > > > > > > > > > > > > On Wed, Dec 19, 2018, 5:13 PM Ek Esawi <esawiek at gmail.com wrote: > > > >> > > > >> Hi All? > > > >> > > > >> I am using the R tabulizer package to extract tables from pdf files. > > > >> The output is a set of lists of matrices. The package extracts tables > > > >> and a lot of extra stuff which is nearly impossible to clean with > > > >> RegEx. So, I want to clean it manually. > > > >> To do so I need to (1) combine all lists in a single list or data > > > >> frame and (2) then write the single entity to a text file to edit it. > > > >> I could not figure out how. > > > >> > > > >> I tried something like this but did not work. > > > >> lapply(MyTables, function(x) > > > >> lapply(x,write.table(file="temp.txt",append = TRUE))) > > > >> > > > >> Any help is greatly appreciated. > > > >> > > > >> Here is my code: > > > >> > > > >> install.packages("rJava") ;library(rJava) > > > >> install.packages("tabulizer");library(tabulizer) > > > >> MyPath <- "C:/Users/name/Documents/tEMP" > > > >> ExtTable <- function (Path,CalOrd){ > > > >> FileNames <- dir(Path, pattern =".(pdf|PDF)",full.names = TRUE) > > > >> MyFiles <- lapply(FileNames, function(i) extract_tables(i,method = "stream")) > > > >> if(CalOrd == "Yes"){ > > > >> MyOFiles <- gsub("(\\s.*)|(.pdf|.PDF)","",basename(FileNames)) > > > >> MyOFiles <- match(MyOFiles,month.name) > > > >> MyNFiles <- MyFiles[order(MyOFiles)]} > > > >> else > > > >> MyFiles > > > >> } > > > >> MyTables <- ExtTable(Path=MyPath,CalOrd = "No") > > > >> > > > >> Here is cleaned portion of the output: The whole output consists of 3 > > > >> lists, each contains 12, 15, and 12 sub-lists. > > > >> > > > >> [[2]][[2]] > > > >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > > > >> [1,] "" "Avg." "+_ lo" "n" "Med." "" "Avg." "+_ > > > >> lo" "n" "Med." > > > >> [2,] "SiOz" "44.0" "1.26" "375" "44.1" "Nb" "4.8" "6.3" > > > >> "58" "2.7" > > > >> [3,] "T i O 2" "0.09" "0.09" "561" "0.09" "Mo(b)" "50" "30" > > > >> "3" "35" > > > >> [4,] "A1203" "2.27" "1.10" "375" "2.20" "Ru(b)" "12.4" "4.1" > > > >> "3" "12" > > > >> [5,] "FeO total" "8.43" "1.14" "375" "8.19" "Pd(b)" "3.9" "2.1" > > > >> "19" "4.1" > > > >> [6,] "MnO" "0.14" "0.03" "366" "0.14" "Ag(b)" "6.8" "8.3" > > > >> "17" "4.8" > > > >> [7,] "MgO" "41.4" "3.00" "375" "41.2" "Cd(b)" "41" "14" > > > >> "16" "37" > > > >> [8,] "CaO" "2.15" "1.11" "374" "2.20" "In(b)" "12" "4" > > > >> "19" "12" > > > >> [9,] "Na20" "0.24" "0.16" "341" "0.21" "Sn(b)" "54" "31" > > > >> "6" "36" > > > >> [10,] "K20" "0.054" "0.11" "330" "0.028" "Sb(b)" "3.9" "3.9" > > > >> "11" "3.2" > > > >> [11,] "P205" "0.056" "0.11" "233" "0.030" "Te(b)" "11" "4" > > > >> "18" "10" > > > >> [12,] "Total" "98.88" "" "" "98.43" "Cs(b)" "10" "16" > > > >> "17" "1.5" > > > >> [13,] "" "" "" "" "" "Ba" "33" "52" > > > >> "75" "17" > > > >> [14,] "Mg-value" "89.8" "1.1" "375" "90.0" "La" "2.60" "5.70" > > > >> "208" "0.77" > > > >> [15,] "Ca/AI" "1.28" "1.6" "374" "1.35" "Ce" "6.29" "11.7" > > > >> "197" "2.08" > > > >> [16,] "AI/Ti" "22" "29" "361" "22" "Pr" "0.56" "0.87" > > > >> "40" "0.21" > > > >> [17,] "F e / M n" "60" "10" "366" "59" "Nd" "2.67" "4.31" > > > >> "162" "1.52" > > > >> [18,] "" "" "" "" "" "Sm" "0.47" "0.69" > > > >> "214" "0.25" > > > >> [19,] "Li" "1.5" "0.3" "6" "1.5" "Eu" "0.16" "0.21" > > > >> "201" "0.097" > > > >> [20,] "B" "0.53" "0.07" "6" "0.55" "Gd" "0.60" "0.83" > > > >> "67" "0.31" > > > >> [21,] "C" "110" "50" "13" "93" "Tb" "0.070" > > > >> "0.064" "146" "0.056" > > > >> [22,] "F" "88" "71" "15" "100" "Dy" "0.51" "0.35" > > > >> "58" "0.47" > > > >> [23,] "S" "157" "77" "22" "152" "Ho" "0.12" "0.14" > > > >> "54" "0.090" > > > >> [24,] "C1" "53" "45" "15" "75" "Er" "0.30" "0.22" > > > >> "52" "0.28" > > > >> [25,] "Sc" "12.2" "6.4" "220" "12.0" "Tm" "0.038" > > > >> "0.026" "40" "0.035" > > > >> [26,] "V" "56" "21" "132" "53" "Yb" "0.26" "0.14" > > > >> "201" "0.27" > > > >> [27,] "Cr" "2690" "705" "325" "2690" "Lu" "0.043" > > > >> "0.023" "172" "0.045" > > > >> [28,] "Co" "112" "10" "166" "111" "Hf" "0.27" "0.30" > > > >> "71" "0.17" > > > >> [29,] "Ni" "2160" "304" "308" "2140" "Ta" "0.40" "0.51" > > > >> "38" "0.23" > > > >> [30,] "Cu" "11" "9" "94" "9" "W(b)" "7.2" "5.2" > > > >> "6" "4.0" > > > >> [31,] "Zn" "65" "20" "129" "60" "Re(b)" "0.13" "0.11" > > > >> "18" "0.09" > > > >> [32,] "Ga" "2.4" "1.3" "49" "2.4" "Os(b)" "4.0" "1.8" > > > >> "18" "3.7" > > > >> [33,] "Ge" "0.96" "0.19" "19" "0.92" "Ir(b)" "3.7" "0.9" > > > >> "34" "3.0" > > > >> [34,] "As" "0.11" "0.07" "7" "0.10" "Pt(b)" "7" "-" > > > >> "1" "-" > > > >> [35,] "Se" "0.041" "0.056" "18" "0.025" "Au(b)" "0.65" "0.53" > > > >> "30" "0.5" > > > >> [36,] "Br" "0.01" "0.01" "6" "0.01" "Tl(b)" "1.2" "1.0" > > > >> "13" "0.9" > > > >> [37,] "Rb" "1,9" "4.8" "97" "0.38" "Pb" "0.16" "0.11" > > > >> "17" "0.16" > > > >> [38,] "Sr" "49" "60" "110" "20" "Bi(b)" "1.7" "0.7" > > > >> "13" "1.6" > > > >> [39,] "Y" "4.4" "5.5" "86" "3.1" "Th*" "0.71" "1.2" > > > >> "71" "0.22" > > > >> [40,] "Zr" "21" "42" "82" "8.0" "U" "0.12" "0.23" > > > >> "48" "0.040" > > > >> [[2]][[4]] > > > >> [,1] [,2] [,3] [,4] [,5] > > > >> [,6] > > > >> [1,] "" "Spinel peridotites" "" "Garnet peridotites" > > > >> "" "Primitive" > > > >> [2,] "" "Avg. Meal." "M-A sp" "M-A gt B-M" > > > >> "Jordan" "mantle" > > > >> [3,] "SiO 2" "44.0 44.1" "44.15" "44.99 45.00" > > > >> "45.55" "44.8" > > > >> [4,] "TiO 2" "0.09 0.09" "0.07" "0.06 0.08" > > > >> "0.11" "0.21" > > > >> [5,] "A1203" "2.27 2.20" "1.96" "1.40 1.31" > > > >> "1.43" "4.45" > > > >> [6,] "Cr203" "0.39 0.39" "0.44" "0.32 0.38" > > > >> "0.34" "0.43" > > > >> [7,] "FeOtotal" "8.43 8.19" "8.28" "7.89 6.97" > > > >> "7.61" "8.40" > > > >> [8,] "Mn O" "0.14 0.14" "0.12" "0.11 0.13" > > > >> "0.11" "0.14" > > > >> [9,] "MgO" "41.4 41.2" "42.25" "42.60 44.86" > > > >> "43.55" "37.2" > > > >> [10,] "NiO" "0.27 0.27" "0.27" "0.26 0.29" > > > >> "-" "0.24" > > > >> [11,] "CaO" "2.15 2.20" "2.08" "0.82 0.77" > > > >> "1.05" "3.60" > > > >> [12,] "Na 20" "0.24 0.21" "0.18" "0.11 0.09" > > > >> "0.14" "0.34" > > > >> [13,] "K 2 0" "0.054 0.028" "0.05" "0.04 0.10" > > > >> "0.11" "0.028" > > > >> [14,] "P205" "0.056 0.030" "0.02" "- 0.01" > > > >> "-" "0.022" > > > >> [15,] "Total" "99.49 99.05" "99.87" "98.60 100.00" > > > >> "100.00" "99.86" > > > >> [16,] "Mg-value" "89.8 90.0" "90.1" "90.6 92.0" > > > >> "91.1" "88.8" > > > >> [17,] "olivine" "62 63" "67" "65 68" > > > >> "66" "56 57" > > > >> [18,] "opx" "24 24" "22" "28 25" > > > >> "28" "22 17" > > > >> [19,] "cpx" "12 11" "9" "3 2" > > > >> "3" "19 10" > > > >> [20,] "spinel" "2 2" "2" "- -" > > > >> "-" "3 -" > > > >> > > > >> Here is portion of the output for str(MyTables): > > > >> > > > >> str(MyTables) > > > >> > > > >> List of 3 > > > >> $ :List of 12 > > > >> $ : chr [1:3, 1:2] "south of the artificial lake Lokka. Intrusive > > > >> complexes" "of alkaline rocks are found at Sokli (phosphorite-bear-" > > > >> "ing and a possible Nb-occurrence) in Finland, and at" "(Eriksson, > > > >> 1992). During this period, Northern Europe" ... > > > >> ..$ : chr [1:55, 1:15] "Element" "Ag" "Al" "Al_XRF" ... > > > >> ..$ : chr [1:56, 1:2] "in the till is mainly of local origin, > > > >> although some cob-" "bles and boulders may have been transported over > > > >> sev-" "eral kilometres. The moraine formations in the study" "area are > > > >> mostly gravelly and sandy tills, locally hum-" ... > > > >> ..$ : chr [1:53, 1:2] "requisites. PCA accounts for maximum variance > > > >> of all" "variables, while FA is based on the correlation structure" > > > >> "of the variables. The model of factor analysis allows that" "the > > > >> common factors do not explain the total variation of" ... > > > >> ..$ : chr [1:54, 1:7] "lished examples of the use of factor > > > >> analysis, it is neglec-" "ted that regional geochemical (and > > > >> environmental) data" "almost never follow a normal distribution. > > > >> Continuing Method" "with factor analysis in such a case must lead to > > > >> biased" ... > > > >> ..$ : chr [1:16, 1:2] "shows the factor loadings of the different > > > >> variables" "entering each factor. Names of variables with an abso-" > > > >> "lute value of the loadings <0.3 are not plotted. Fig. 5" "shows 8 > > > >> results of factor analyses using a selection of all" ... > > > >> ..$ : chr [1:21, 1:2] "pretable results, notwithstanding the fact > > > >> that on the" "basis of the foregoing discussion it should probably > > > >> not" "be used with these data. Do these results warrant the use" "of a > > > >> quite work-intensive method? Unfortunately not," ... > > > >> ..$ : chr [1:55, 1:8] "" "Ag" "Al" "Al_XRF" ... > > > >> ..$ : chr [1:23, 1:2] "addition, geochemical reasoning (e.g. > > > >> geochemical asso-" "ciations and/or pathfinder elements for different > > > >> types of" "ore deposits) was used to select further sub-sets of vari-" > > > >> "ables. In geochemistry, the selection of elements entered" ... > > > >> ..$ : chr [1:55, 1:2] "Fig. 10C cuts several geological units, and > > > >> is most likely" "indicative of alteration processes related to a > > > >> deep-" "seated fault. It was revealed again in a factor analysis" > > > >> "carried out with all those elements extracted by aqua" ... > > > >> ..$ : chr [1:50, 1:2] "well justified in stating that it is not very > > > >> scientific to" "play with the selection of elements and number of > > > >> fac-" "tors extracted until one > > > >> ?\200\230?\200\230finds?\200\231?\200\231 an > > > >> ?\200\230?\200\230interesting?\200\231?\200\231 result." "On the other > > > >> hand, even all the different results pre-" ... > > > >> ..$ : chr [1:24, 1:2] "Niemel??, J., Ekman, I., Lukashov, A. (Eds.), > > > >> 1993. Quaternary" "Deposits of Finland and Northwestern Part of > > > >> Russian Fed-" "eration and Their Resources 1:1,000,000. Geological > > > >> Survey" "of Finland, Espoo, Finland." ... > > > >> $ :List of 15 > > > >> > > > >> ______________________________________________ > > > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > >> https://stat.ethz.ch/mailman/listinfo/r-help > > > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > >> and provide commented, minimal, self-contained, reproducible code. > > > > > > ______________________________________________ > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > and provide commented, minimal, self-contained, reproducible code.
Ek Esawi
2018-Dec-20 13:27 UTC
[R] Combine recursive lists in a single list or data frame and write it to file
Thanks again Jim. The links below are for 2 files (papers) i downloaded from Google Scholar for testing. You can use either both or any other pdf files with tables. Thanks again-EK. https://pdfs.semanticscholar.org/50a4/2b8146f08161b1036457fe0d241b6b898974.pdf https://pdfs.semanticscholar.org/50a4/2b8146f08161b1036457fe0d241b6b898974.pdf The code: install.packages("rJava") ;library(rJava) install.packages("tabulizer");library(tabulizer) MyPath <- "C:/Users/name/Documents/Temp" ExtTable <- function (Path,CalOrd){ FileNames <- dir(Path, pattern =".(pdf|PDF)",full.names = TRUE) MyFiles <- lapply(FileNames, function(i) extract_tables(i,method = "stream")) if(CalOrd == "Yes"){ MyOFiles <- gsub("(\\s.*)|(.pdf|.PDF)","",basename(FileNames)) MyOFiles <- match(MyOFiles,month.name) MyNFiles <- MyFiles[order(MyOFiles)]} else MyFiles } MyTables <- ExtTable(Path=MyPath,CalOrd = "Yes") On Thu, Dec 20, 2018 at 12:28 AM Jim Lemon <drjimlemon at gmail.com> wrote:> > Hi Ek, > It looks to me as though you are not joining the lists into a single > list, then calling FillList and then converting to a data frame. If > you can send some data (if it's not too big) I can test it and make > sure that it works, as it did every time for me. > > Jim > > On Thu, Dec 20, 2018 at 2:22 PM Ek Esawi <esawiek at gmail.com> wrote: > > > > Thank you Jim. I did use unlist with the recursive option which > > converted the 3 levels list to a list of 38 matrices. I tried your > > earlier function to join the 38 matrices, all of which have different > > number of columns and rows, but i kept getting an error. > > > > fillList<-function(x) { > > + maxrows<-max(unlist(lapply(x,length))) > > + return(lapply(x,"[",1:maxrows)) > > + } > > > > > > for (i in 1:length(MyTables)) { > > + write.table(as.data.frame(fillList(MyTables[i])), > > + file = "Temp.txt",append = TRUE,quote = TRUE)} > > Error in (function (..., row.names = NULL, check.rows = FALSE, > > check.names = TRUE, : > > arguments imply differing number of rows: 3, 55, 56, 53, 54, 16, 21, > > 23, 50, 24 > > > > > > On Wed, Dec 19, 2018 at 9:36 PM Jim Lemon <drjimlemon at gmail.com> wrote: > > > > > > Hi Ek, > > > Look at unlist and the argument "recursive". You can step down through > > > the levels or a nested list to convert it to a single level list. > > > > > > Jim > > > > > > On Thu, Dec 20, 2018 at 1:33 PM Ek Esawi <esawiek at gmail.com> wrote: > > > > > > > > Thank you Bert. I don't see how unlist will help. I want to combine > > > > them but keep the "rectangular structure",e.g. list, data frame, > > > > matrix because i want to get the tables in their original form. > > > > Unlist converts the whole output to a single vector; unless i am > > > > missing something. > > > > > > > > On Wed, Dec 19, 2018 at 9:10 PM Bert Gunter <bgunter.4567 at gmail.com> wrote: > > > > > > > > > > Does ?unlist not help? Why not? > > > > > > > > > > Bert > > > > > > > > > > > > > > > On Wed, Dec 19, 2018, 5:13 PM Ek Esawi <esawiek at gmail.com wrote: > > > > >> > > > > >> Hi All? > > > > >> > > > > >> I am using the R tabulizer package to extract tables from pdf files. > > > > >> The output is a set of lists of matrices. The package extracts tables > > > > >> and a lot of extra stuff which is nearly impossible to clean with > > > > >> RegEx. So, I want to clean it manually. > > > > >> To do so I need to (1) combine all lists in a single list or data > > > > >> frame and (2) then write the single entity to a text file to edit it. > > > > >> I could not figure out how. > > > > >> > > > > >> I tried something like this but did not work. > > > > >> lapply(MyTables, function(x) > > > > >> lapply(x,write.table(file="temp.txt",append = TRUE))) > > > > >> > > > > >> Any help is greatly appreciated. > > > > >> > > > > >> Here is my code: > > > > >> > > > > >> install.packages("rJava") ;library(rJava) > > > > >> install.packages("tabulizer");library(tabulizer) > > > > >> MyPath <- "C:/Users/name/Documents/tEMP" > > > > >> ExtTable <- function (Path,CalOrd){ > > > > >> FileNames <- dir(Path, pattern =".(pdf|PDF)",full.names = TRUE) > > > > >> MyFiles <- lapply(FileNames, function(i) extract_tables(i,method = "stream")) > > > > >> if(CalOrd == "Yes"){ > > > > >> MyOFiles <- gsub("(\\s.*)|(.pdf|.PDF)","",basename(FileNames)) > > > > >> MyOFiles <- match(MyOFiles,month.name) > > > > >> MyNFiles <- MyFiles[order(MyOFiles)]} > > > > >> else > > > > >> MyFiles > > > > >> } > > > > >> MyTables <- ExtTable(Path=MyPath,CalOrd = "No") > > > > >> > > > > >> Here is cleaned portion of the output: The whole output consists of 3 > > > > >> lists, each contains 12, 15, and 12 sub-lists. > > > > >> > > > > >> [[2]][[2]] > > > > >> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] > > > > >> [1,] "" "Avg." "+_ lo" "n" "Med." "" "Avg." "+_ > > > > >> lo" "n" "Med." > > > > >> [2,] "SiOz" "44.0" "1.26" "375" "44.1" "Nb" "4.8" "6.3" > > > > >> "58" "2.7" > > > > >> [3,] "T i O 2" "0.09" "0.09" "561" "0.09" "Mo(b)" "50" "30" > > > > >> "3" "35" > > > > >> [4,] "A1203" "2.27" "1.10" "375" "2.20" "Ru(b)" "12.4" "4.1" > > > > >> "3" "12" > > > > >> [5,] "FeO total" "8.43" "1.14" "375" "8.19" "Pd(b)" "3.9" "2.1" > > > > >> "19" "4.1" > > > > >> [6,] "MnO" "0.14" "0.03" "366" "0.14" "Ag(b)" "6.8" "8.3" > > > > >> "17" "4.8" > > > > >> [7,] "MgO" "41.4" "3.00" "375" "41.2" "Cd(b)" "41" "14" > > > > >> "16" "37" > > > > >> [8,] "CaO" "2.15" "1.11" "374" "2.20" "In(b)" "12" "4" > > > > >> "19" "12" > > > > >> [9,] "Na20" "0.24" "0.16" "341" "0.21" "Sn(b)" "54" "31" > > > > >> "6" "36" > > > > >> [10,] "K20" "0.054" "0.11" "330" "0.028" "Sb(b)" "3.9" "3.9" > > > > >> "11" "3.2" > > > > >> [11,] "P205" "0.056" "0.11" "233" "0.030" "Te(b)" "11" "4" > > > > >> "18" "10" > > > > >> [12,] "Total" "98.88" "" "" "98.43" "Cs(b)" "10" "16" > > > > >> "17" "1.5" > > > > >> [13,] "" "" "" "" "" "Ba" "33" "52" > > > > >> "75" "17" > > > > >> [14,] "Mg-value" "89.8" "1.1" "375" "90.0" "La" "2.60" "5.70" > > > > >> "208" "0.77" > > > > >> [15,] "Ca/AI" "1.28" "1.6" "374" "1.35" "Ce" "6.29" "11.7" > > > > >> "197" "2.08" > > > > >> [16,] "AI/Ti" "22" "29" "361" "22" "Pr" "0.56" "0.87" > > > > >> "40" "0.21" > > > > >> [17,] "F e / M n" "60" "10" "366" "59" "Nd" "2.67" "4.31" > > > > >> "162" "1.52" > > > > >> [18,] "" "" "" "" "" "Sm" "0.47" "0.69" > > > > >> "214" "0.25" > > > > >> [19,] "Li" "1.5" "0.3" "6" "1.5" "Eu" "0.16" "0.21" > > > > >> "201" "0.097" > > > > >> [20,] "B" "0.53" "0.07" "6" "0.55" "Gd" "0.60" "0.83" > > > > >> "67" "0.31" > > > > >> [21,] "C" "110" "50" "13" "93" "Tb" "0.070" > > > > >> "0.064" "146" "0.056" > > > > >> [22,] "F" "88" "71" "15" "100" "Dy" "0.51" "0.35" > > > > >> "58" "0.47" > > > > >> [23,] "S" "157" "77" "22" "152" "Ho" "0.12" "0.14" > > > > >> "54" "0.090" > > > > >> [24,] "C1" "53" "45" "15" "75" "Er" "0.30" "0.22" > > > > >> "52" "0.28" > > > > >> [25,] "Sc" "12.2" "6.4" "220" "12.0" "Tm" "0.038" > > > > >> "0.026" "40" "0.035" > > > > >> [26,] "V" "56" "21" "132" "53" "Yb" "0.26" "0.14" > > > > >> "201" "0.27" > > > > >> [27,] "Cr" "2690" "705" "325" "2690" "Lu" "0.043" > > > > >> "0.023" "172" "0.045" > > > > >> [28,] "Co" "112" "10" "166" "111" "Hf" "0.27" "0.30" > > > > >> "71" "0.17" > > > > >> [29,] "Ni" "2160" "304" "308" "2140" "Ta" "0.40" "0.51" > > > > >> "38" "0.23" > > > > >> [30,] "Cu" "11" "9" "94" "9" "W(b)" "7.2" "5.2" > > > > >> "6" "4.0" > > > > >> [31,] "Zn" "65" "20" "129" "60" "Re(b)" "0.13" "0.11" > > > > >> "18" "0.09" > > > > >> [32,] "Ga" "2.4" "1.3" "49" "2.4" "Os(b)" "4.0" "1.8" > > > > >> "18" "3.7" > > > > >> [33,] "Ge" "0.96" "0.19" "19" "0.92" "Ir(b)" "3.7" "0.9" > > > > >> "34" "3.0" > > > > >> [34,] "As" "0.11" "0.07" "7" "0.10" "Pt(b)" "7" "-" > > > > >> "1" "-" > > > > >> [35,] "Se" "0.041" "0.056" "18" "0.025" "Au(b)" "0.65" "0.53" > > > > >> "30" "0.5" > > > > >> [36,] "Br" "0.01" "0.01" "6" "0.01" "Tl(b)" "1.2" "1.0" > > > > >> "13" "0.9" > > > > >> [37,] "Rb" "1,9" "4.8" "97" "0.38" "Pb" "0.16" "0.11" > > > > >> "17" "0.16" > > > > >> [38,] "Sr" "49" "60" "110" "20" "Bi(b)" "1.7" "0.7" > > > > >> "13" "1.6" > > > > >> [39,] "Y" "4.4" "5.5" "86" "3.1" "Th*" "0.71" "1.2" > > > > >> "71" "0.22" > > > > >> [40,] "Zr" "21" "42" "82" "8.0" "U" "0.12" "0.23" > > > > >> "48" "0.040" > > > > >> [[2]][[4]] > > > > >> [,1] [,2] [,3] [,4] [,5] > > > > >> [,6] > > > > >> [1,] "" "Spinel peridotites" "" "Garnet peridotites" > > > > >> "" "Primitive" > > > > >> [2,] "" "Avg. Meal." "M-A sp" "M-A gt B-M" > > > > >> "Jordan" "mantle" > > > > >> [3,] "SiO 2" "44.0 44.1" "44.15" "44.99 45.00" > > > > >> "45.55" "44.8" > > > > >> [4,] "TiO 2" "0.09 0.09" "0.07" "0.06 0.08" > > > > >> "0.11" "0.21" > > > > >> [5,] "A1203" "2.27 2.20" "1.96" "1.40 1.31" > > > > >> "1.43" "4.45" > > > > >> [6,] "Cr203" "0.39 0.39" "0.44" "0.32 0.38" > > > > >> "0.34" "0.43" > > > > >> [7,] "FeOtotal" "8.43 8.19" "8.28" "7.89 6.97" > > > > >> "7.61" "8.40" > > > > >> [8,] "Mn O" "0.14 0.14" "0.12" "0.11 0.13" > > > > >> "0.11" "0.14" > > > > >> [9,] "MgO" "41.4 41.2" "42.25" "42.60 44.86" > > > > >> "43.55" "37.2" > > > > >> [10,] "NiO" "0.27 0.27" "0.27" "0.26 0.29" > > > > >> "-" "0.24" > > > > >> [11,] "CaO" "2.15 2.20" "2.08" "0.82 0.77" > > > > >> "1.05" "3.60" > > > > >> [12,] "Na 20" "0.24 0.21" "0.18" "0.11 0.09" > > > > >> "0.14" "0.34" > > > > >> [13,] "K 2 0" "0.054 0.028" "0.05" "0.04 0.10" > > > > >> "0.11" "0.028" > > > > >> [14,] "P205" "0.056 0.030" "0.02" "- 0.01" > > > > >> "-" "0.022" > > > > >> [15,] "Total" "99.49 99.05" "99.87" "98.60 100.00" > > > > >> "100.00" "99.86" > > > > >> [16,] "Mg-value" "89.8 90.0" "90.1" "90.6 92.0" > > > > >> "91.1" "88.8" > > > > >> [17,] "olivine" "62 63" "67" "65 68" > > > > >> "66" "56 57" > > > > >> [18,] "opx" "24 24" "22" "28 25" > > > > >> "28" "22 17" > > > > >> [19,] "cpx" "12 11" "9" "3 2" > > > > >> "3" "19 10" > > > > >> [20,] "spinel" "2 2" "2" "- -" > > > > >> "-" "3 -" > > > > >> > > > > >> Here is portion of the output for str(MyTables): > > > > >> > > > > >> str(MyTables) > > > > >> > > > > >> List of 3 > > > > >> $ :List of 12 > > > > >> $ : chr [1:3, 1:2] "south of the artificial lake Lokka. Intrusive > > > > >> complexes" "of alkaline rocks are found at Sokli (phosphorite-bear-" > > > > >> "ing and a possible Nb-occurrence) in Finland, and at" "(Eriksson, > > > > >> 1992). During this period, Northern Europe" ... > > > > >> ..$ : chr [1:55, 1:15] "Element" "Ag" "Al" "Al_XRF" ... > > > > >> ..$ : chr [1:56, 1:2] "in the till is mainly of local origin, > > > > >> although some cob-" "bles and boulders may have been transported over > > > > >> sev-" "eral kilometres. The moraine formations in the study" "area are > > > > >> mostly gravelly and sandy tills, locally hum-" ... > > > > >> ..$ : chr [1:53, 1:2] "requisites. PCA accounts for maximum variance > > > > >> of all" "variables, while FA is based on the correlation structure" > > > > >> "of the variables. The model of factor analysis allows that" "the > > > > >> common factors do not explain the total variation of" ... > > > > >> ..$ : chr [1:54, 1:7] "lished examples of the use of factor > > > > >> analysis, it is neglec-" "ted that regional geochemical (and > > > > >> environmental) data" "almost never follow a normal distribution. > > > > >> Continuing Method" "with factor analysis in such a case must lead to > > > > >> biased" ... > > > > >> ..$ : chr [1:16, 1:2] "shows the factor loadings of the different > > > > >> variables" "entering each factor. Names of variables with an abso-" > > > > >> "lute value of the loadings <0.3 are not plotted. Fig. 5" "shows 8 > > > > >> results of factor analyses using a selection of all" ... > > > > >> ..$ : chr [1:21, 1:2] "pretable results, notwithstanding the fact > > > > >> that on the" "basis of the foregoing discussion it should probably > > > > >> not" "be used with these data. Do these results warrant the use" "of a > > > > >> quite work-intensive method? Unfortunately not," ... > > > > >> ..$ : chr [1:55, 1:8] "" "Ag" "Al" "Al_XRF" ... > > > > >> ..$ : chr [1:23, 1:2] "addition, geochemical reasoning (e.g. > > > > >> geochemical asso-" "ciations and/or pathfinder elements for different > > > > >> types of" "ore deposits) was used to select further sub-sets of vari-" > > > > >> "ables. In geochemistry, the selection of elements entered" ... > > > > >> ..$ : chr [1:55, 1:2] "Fig. 10C cuts several geological units, and > > > > >> is most likely" "indicative of alteration processes related to a > > > > >> deep-" "seated fault. It was revealed again in a factor analysis" > > > > >> "carried out with all those elements extracted by aqua" ... > > > > >> ..$ : chr [1:50, 1:2] "well justified in stating that it is not very > > > > >> scientific to" "play with the selection of elements and number of > > > > >> fac-" "tors extracted until one > > > > >> ?\200\230?\200\230finds?\200\231?\200\231 an > > > > >> ?\200\230?\200\230interesting?\200\231?\200\231 result." "On the other > > > > >> hand, even all the different results pre-" ... > > > > >> ..$ : chr [1:24, 1:2] "Niemel??, J., Ekman, I., Lukashov, A. (Eds.), > > > > >> 1993. Quaternary" "Deposits of Finland and Northwestern Part of > > > > >> Russian Fed-" "eration and Their Resources 1:1,000,000. Geological > > > > >> Survey" "of Finland, Espoo, Finland." ... > > > > >> $ :List of 15 > > > > >> > > > > >> ______________________________________________ > > > > >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > >> https://stat.ethz.ch/mailman/listinfo/r-help > > > > >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > > >> and provide commented, minimal, self-contained, reproducible code. > > > > > > > > ______________________________________________ > > > > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > > > > https://stat.ethz.ch/mailman/listinfo/r-help > > > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > > > and provide commented, minimal, self-contained, reproducible code.