Dear R Users, I am working on extracting tables from PDF and I am writing that in a csv file. When I executed the code, the tables were not properly written in the csv file. Here is my code: library(tabulizer) # Location of pdf file. location <- ' http://keic.mica-apps.net/wwwisis/ET_Annual_Reports/Religare_Enterprises_Ltd/RELIGARE-2017-2018.pdf ' # Extract the table out <- extract_tables(location) for(i in 1:length(out)) { write.table(out[i], file='Output.csv',append=TRUE, sep=",",quote FALSE) } I enclosed the screenshot of the output file. In that you can see the tables are incomplete. Any help would be appreciated. Thanks Sripriya. -------------- next part -------------- A non-text attachment was scrubbed... Name: Screenshot.png Type: image/png Size: 213656 bytes Desc: not available URL: <https://stat.ethz.ch/pipermail/r-help/attachments/20210620/923ac6e0/attachment.png>
Please read the posting guide, linked below, which says: "For questions about functions in standard packages distributed with R (see the FAQ Add-on packages in R <https://cran.r-project.org/doc/FAQ/R-FAQ.html#Add-on-packages-in-R>), ask questions on R-help. If the question relates to a *contributed package* , e.g., one downloaded from CRAN, try contacting the package maintainer first. You can also use find("functionname") and packageDescription("packagename") to find this information. *Only* send such questions to R-help or R-devel if you get no reply or need further assistance. This applies to both requests for help and to bug reports." You may get lucky here and someone familiar with the tabulizer package will respond; but unless you have already done so and received no response -- in which case say so -- you should contact the maintainer about your problem. Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Mon, Jun 21, 2021 at 2:14 PM Sri Priya <sri.chocho at gmail.com> wrote:> Dear R Users, > > I am working on extracting tables from PDF and I am writing that in a csv > file. When I executed the code, the tables were not properly written in the > csv file. > > Here is my code: > > library(tabulizer) > # Location of pdf file. > location <- ' > > http://keic.mica-apps.net/wwwisis/ET_Annual_Reports/Religare_Enterprises_Ltd/RELIGARE-2017-2018.pdf > ' > > # Extract the table > out <- extract_tables(location) > for(i in 1:length(out)) > { > write.table(out[i], file='Output.csv',append=TRUE, sep=",",quote > FALSE) > } > I enclosed the screenshot of the output file. In that you can see > the tables are incomplete. > > Any help would be appreciated. > > Thanks > Sripriya. > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]
Richard M. Heiberger
2021-Jun-21 22:17 UTC
[R] [External] Data is not properly written in csv file
copy and paste from pdf usually scrambles tables. this package is probably suffering from that pdf characteristic.> On Jun 20, 2021, at 11:03, Sri Priya <sri.chocho at gmail.com> wrote: > > Dear R Users, > > I am working on extracting tables from PDF and I am writing that in a csv > file. When I executed the code, the tables were not properly written in the > csv file. > > Here is my code: > > library(tabulizer) > # Location of pdf file. > location <- ' > https://nam10.safelinks.protection.outlook.com/?url=http%3A%2F%2Fkeic.mica-apps.net%2Fwwwisis%2FET_Annual_Reports%2FReligare_Enterprises_Ltd%2FRELIGARE-2017-2018.pdf&data=04%7C01%7Crmh%40temple.edu%7C1025b6434d6f4b4f7b5008d934f99be3%7C716e81efb52244738e3110bd02ccf6e5%7C0%7C0%7C637599069571109750%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=xLxqLIYvnriqWOoENpbWJvFi7wf03aHslYIz0jPZymY%3D&reserved=0 > ' > > # Extract the table > out <- extract_tables(location) > for(i in 1:length(out)) > { > write.table(out[i], file='Output.csv',append=TRUE, sep=",",quote > FALSE) > } > I enclosed the screenshot of the output file. In that you can see > the tables are incomplete. > > Any help would be appreciated. > > Thanks > Sripriya. > <Screenshot.png>______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Hi, If each of the extracted tables do not have consistent content and structure, that may be causing problems as you append each to the same file. You might want to modify your loop so that each table gets written to a different CSV file and see what that looks like. Also, review ?write.table and take note of the default arguments that are used for write.csv(), as noted in the CSV Files section, and in the Examples. Regards, Marc Schwartz Sri Priya wrote on 6/20/21 11:03 AM:> Dear R Users, > > I am working on extracting tables from PDF and I am writing that in a csv > file. When I executed the code, the tables were not properly written in the > csv file. > > Here is my code: > > library(tabulizer) > # Location of pdf file. > location <- ' > http://keic.mica-apps.net/wwwisis/ET_Annual_Reports/Religare_Enterprises_Ltd/RELIGARE-2017-2018.pdf > ' > > # Extract the table > out <- extract_tables(location) > for(i in 1:length(out)) > { > write.table(out[i], file='Output.csv',append=TRUE, sep=",",quote > FALSE) > } > I enclosed the screenshot of the output file. In that you can see > the tables are incomplete. > > Any help would be appreciated. > > Thanks > Sripriya. >
This was an exact duplicate of a posting to StackOverflow where it has a response. You are asked in the Posting Guide not to crosspost. -- David. On 6/20/21 8:03 AM, Sri Priya wrote:> location <- ' > http://keic.mica-apps.net/wwwisis/ET_Annual_Reports/Religare_Enterprises_Ltd/RELIGARE-2017-2018.pdf > ' > > # Extract the table > out <- extract_tables(location)