r-help.20.stefan817@spamgourmet.com
2004-Oct-26 06:51 UTC
[R] Importing big plain files from ERP-System/Data Mining with R
Hi, how can I import really big plain text data files (several GB) from an ERP-System (SAP-Tables) to R? The Header of these files are always similar, for example: Tabelle: T009 Angezeigte Felder: 7 von 7 Feststehende F??hrungsspalten: 2 Listbreite 0250 ---------------------------------------------------------------------- |X|MANDT|PERIV|XKALE|XJABH|ANZBP|ANZSP|LTEXT | ---------------------------------------------------------------------- |X|001 |01 |X | |012 |02 |ABC | |X|001 |V9 | | |012 |04 |Okt. - Sep., 4 Sonderperioden | |X|001 |WK | |X |053 |00 |Kalenderwochen | ---------------------------------------------------------------------- (including the first 5 rows in each downloaded table, row # 4 =field names, length of 1 row > 1023 bytes, count of fields > 256, size = several GB, count records = several million) What is an appropriate way to read such tables in? Greetings Stefan P.S. I am a beginner with R. Until now I have used ACL (http://www.acl.com) for data mining purposes and I'm doing now my first try with R. Yes, I have [X] Read R Data Import/Export [X] Read Using R for Data Analysis [X] Read Simple R [X] Read Manuals [X] Read read.table() and scan() command
Prof Brian Ripley
2004-Oct-26 07:23 UTC
[R] Importing big plain files from ERP-System/Data Mining with R
On Tue, 26 Oct 2004 r-help.20.stefan817 at spamgourmet.com wrote:> how can I import really big plain text data files (several GB) from anUnlikely unless you have a 64-bit platform. Only starting with R 2.0.0 can some 32-bit versions of R access files > 2Gb, and to import the file into R you need enough address space in R for the object, which is normally more than the file size. Almost certainly not if the unmentioned platform is Windows, but you could access the data from a DBMS.> ERP-System (SAP-Tables) to R? > The Header of these files are always similar, for example: > > Tabelle: T009 > Angezeigte Felder: 7 von 7 Feststehende F??hrungsspalten: 2 Listbreite > 0250 > ---------------------------------------------------------------------- > |X|MANDT|PERIV|XKALE|XJABH|ANZBP|ANZSP|LTEXT | > ---------------------------------------------------------------------- > |X|001 |01 |X | |012 |02 |ABC | > |X|001 |V9 | | |012 |04 |Okt. - Sep., 4 Sonderperioden | > |X|001 |WK | |X |053 |00 |Kalenderwochen | > ---------------------------------------------------------------------- > > (including the first 5 rows in each downloaded table, row # 4 =field names, > length of 1 row > 1023 bytes, count of fields > 256, size = several GB, > count records = several million) > > What is an appropriate way to read such tables in? > > Greetings > Stefan > > P.S. I am a beginner with R. Until now I have used ACL (http://www.acl.com) > for data mining purposes and I'm doing now my first try with R. > Yes, I have > [X] Read R Data Import/Export > [X] Read Using R for Data Analysis > [X] Read Simple R > [X] Read Manuals > [X] Read read.table() and scan() commandbut you have not told us your platform. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
Vito Ricci
2004-Oct-26 08:23 UTC
[R] Re: Importing big plain files from ERP-System/Data Mining with R
Hi, as concern R & datamining & large databases you can see those resources: Diego Kuonen, Introduction au data mining avec R : vers la reconqu??te du `knowledge discovery in databases' par les statisticiens. Bulletin of the Swiss Statistical Society, 40:3-7, 2001. http://www.statoo.com/en/publications/2001.R.SSS.40/ Diego Kuonen and Reinhard Furrer, Data mining avec R dans un monde libre. Flash Informatique Sp??cial ??t??, pages 45-50, sep 2001. http://sawww.epfl.ch/SIC/SA/publications/FI01/fi-sp-1/sp-1-page45.html Brian D. Ripley, Datamining: Large Databases and Methods, in Proceedings of "useR! 2004 - The R User Conference", maggio 2004 http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Ripley.pdf Brian D. Ripley, Using Databases with R, R News, Gennaio 2001, pagg. 18-20 http://cran.r-project.org/doc/Rnews/Rnews_2001-1.pdf B. D. Ripley, R. M. Ripley, Applications of R Clients and Servers in Proceedings of the Distributed Statistical Computing 2001 Workshop, 2001, Vienna University of Technology. http://www.ci.tuwien.ac.at/Conferences/DSC-2001/Proceedings/Ripley.pdf Torsten Hothorn, David A. James, Brian D. Ripley, R/S Interfaces to Databases in Proceedings of the Distributed Statistical Computing 2001 Workshop, 2001,Vienna University of Technology. http://www.ci.tuwien.ac.at/Conferences/DSC-2001/Proceedings/HothornJamesRipley.pdf Lu??s Torgo, Data Mining with R. Learning by case studies, Maggio 2003 http://www.liacc.up.pt/~ltorgo/DataMiningWithR/ Best Vito You wrote: Hi, how can I import really big plain text data files (several GB) from an ERP-System (SAP-Tables) to R? The Header of these files are always similar, for example: Tabelle: T009 Angezeigte Felder: 7 von 7 Feststehende F??hrungsspalten: 2 Listbreite 0250 ---------------------------------------------------------------------- |X|MANDT|PERIV|XKALE|XJABH|ANZBP|ANZSP|LTEXT | ---------------------------------------------------------------------- |X|001 |01 |X | |012 |02 |ABC | |X|001 |V9 | | |012 |04 |Okt. - Sep., 4 Sonderperioden | |X|001 |WK | |X |053 |00 |Kalenderwochen | ---------------------------------------------------------------------- (including the first 5 rows in each downloaded table, row # 4 =field names, length of 1 row > 1023 bytes, count of fields > 256, size = several GB, count records = several million) What is an appropriate way to read such tables in? Greetings Stefan P.S. I am a beginner with R. Until now I have used ACL (http://www.acl.com) for data mining purposes and I'm doing now my first try with R. Yes, I have [X] Read R Data Import/Export [X] Read Using R for Data Analysis [X] Read Simple R [X] Read Manuals [X] Read read.table() and scan() command ====Diventare costruttori di soluzioni "The business of the statistician is to catalyze the scientific learning process." George E. P. Box Visitate il portale http://www.modugno.it/ e in particolare la sezione su Palese http://www.modugno.it/archivio/cat_palese.shtml
r-help.20.stefan817@spamgourmet.com
2004-Oct-26 12:11 UTC
[R] Importing big plain files from ERP-System/Data Mining with R
On Tue, 26 Oct 2004 r-help.20.stefan817 at spamgourmet.com wrote:>> how can I import really big plain text data files (several GB) from an>Unlikely unless you have a 64-bit platform.Why? I have a 32-bit Win XP Platform running R 2.0.0. With ACL 8.21 e.g. 10 GB were no problem.>Only starting with R 2.0.0 can some 32-bit versions of R access files > >2Gb, and to import the file into R you need enough address space in R for >the object, which is normally more than the file size.Is this really so? I want to summarize the data or calculate clusters, so only the aggregated information should be in memory. Does R first import the whole file and then calculate with it? In ACL the concept is to leave the file itself on the harddisk, scanning it for each calculation and doing only the calculation in memory. (Surely not very fast, but probably the only method for big files)>Almost certainly not if the unmentioned platform is Windows, but you could >access the data from a DBMS.I can do this also, but with several limitations.>> ERP-System (SAP-Tables) to R? >> The Header of these files are always similar, for example: >> >> Tabelle: T009 >> Angezeigte Felder: 7 von 7 Feststehende F??hrungsspalten: 2 Listbreite >> 0250 >> ---------------------------------------------------------------------- >> |X|MANDT|PERIV|XKALE|XJABH|ANZBP|ANZSP|LTEXT | >> ---------------------------------------------------------------------- >> |X|001 |01 |X | |012 |02 |ABC | >> |X|001 |V9 | | |012 |04 |Okt. - Sep., 4 Sonderperioden | >> |X|001 |WK | |X |053 |00 |Kalenderwochen | >> ---------------------------------------------------------------------- >> >> (including the first 5 rows in each downloaded table, row # 4 =field >names, >> length of 1 row > 1023 bytes, count of fields > 256, size = several GB, >> count records = several million) >> >> What is an appropriate way to read such tables in?Greetings Stefan