thr3ads.net - R help - [R] Importing big plain files from ERP-System/Data Mining with R [Oct 2004]

If this information is useful, please help other people find it:
Share via:

r-help.20.stefan817@spamgourmet.com

2004-Oct-26 06:51 UTC

[R] Importing big plain files from ERP-System/Data Mining with R

Hi,

how can I import really big plain text data files (several GB) from an
ERP-System (SAP-Tables) to R?
The Header of these files are always similar, for example:

Tabelle:        T009
Angezeigte Felder:  7 von  7  Feststehende F??hrungsspalten: 2  Listbreite
0250
----------------------------------------------------------------------
|X|MANDT|PERIV|XKALE|XJABH|ANZBP|ANZSP|LTEXT                         |
----------------------------------------------------------------------
|X|001  |01   |X    |     |012  |02   |ABC                           |
|X|001  |V9   |     |     |012  |04   |Okt. - Sep., 4 Sonderperioden |
|X|001  |WK   |     |X    |053  |00   |Kalenderwochen                |
----------------------------------------------------------------------

(including the first 5 rows in each downloaded table, row # 4 =field names,
length of 1 row > 1023 bytes, count of fields > 256, size = several GB,
count records = several million)

What is an appropriate way to read such tables in?

Greetings
Stefan

P.S. I am a beginner with R. Until now I have used ACL (http://www.acl.com)
for data mining purposes and I'm doing now my first try with R.
Yes, I have
[X] Read R Data Import/Export
[X] Read Using R for Data Analysis
[X] Read Simple R
[X] Read Manuals
[X] Read read.table() and scan() command

Prof Brian Ripley

2004-Oct-26 07:23 UTC

head link

[R] Importing big plain files from ERP-System/Data Mining with R

On Tue, 26 Oct 2004 r-help.20.stefan817 at spamgourmet.com wrote:
> how can I import really big plain text data files (several GB) from an
Unlikely unless you have a 64-bit platform.

Only starting with R 2.0.0 can some 32-bit versions of R access files >
2Gb, and to import the file into R you need enough address space in R for
the object, which is normally more than the file size.

Almost certainly not if the unmentioned platform is Windows, but you could 
access the data from a DBMS.

> ERP-System (SAP-Tables) to R?
> The Header of these files are always similar, for example:
> 
> Tabelle:        T009
> Angezeigte Felder:  7 von  7  Feststehende F??hrungsspalten: 2  Listbreite
> 0250
> ----------------------------------------------------------------------
> |X|MANDT|PERIV|XKALE|XJABH|ANZBP|ANZSP|LTEXT                         |
> ----------------------------------------------------------------------
> |X|001  |01   |X    |     |012  |02   |ABC                           |
> |X|001  |V9   |     |     |012  |04   |Okt. - Sep., 4 Sonderperioden |
> |X|001  |WK   |     |X    |053  |00   |Kalenderwochen                |
> ----------------------------------------------------------------------
> 
> (including the first 5 rows in each downloaded table, row # 4 =field names,
> length of 1 row > 1023 bytes, count of fields > 256, size = several
GB,
> count records = several million)
> 
> What is an appropriate way to read such tables in?
> 
> Greetings
> Stefan
> 
> P.S. I am a beginner with R. Until now I have used ACL (http://www.acl.com)
> for data mining purposes and I'm doing now my first try with R.
> Yes, I have
> [X] Read R Data Import/Export
> [X] Read Using R for Data Analysis
> [X] Read Simple R
> [X] Read Manuals
> [X] Read read.table() and scan() command
but you have not told us your platform.

-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Vito Ricci

2004-Oct-26 08:23 UTC

head link

[R] Re: Importing big plain files from ERP-System/Data Mining with R

Hi,
as concern R & datamining & large databases you can
see those resources:

Diego Kuonen, Introduction au data mining avec R :
vers la reconqu??te du `knowledge discovery in
databases' par les statisticiens. Bulletin of the
Swiss Statistical Society, 40:3-7, 2001.
http://www.statoo.com/en/publications/2001.R.SSS.40/

Diego Kuonen and Reinhard Furrer, Data mining avec R
dans un monde libre. Flash Informatique Sp??cial ??t??,
pages 45-50, sep 2001.
http://sawww.epfl.ch/SIC/SA/publications/FI01/fi-sp-1/sp-1-page45.html


Brian D. Ripley, Datamining: Large Databases and
Methods, in Proceedings  of "useR! 2004 - The R User
Conference", maggio 2004
http://www.ci.tuwien.ac.at/Conferences/useR-2004/Keynotes/Ripley.pdf

Brian D. Ripley, Using Databases with R, R News,
Gennaio 2001, pagg. 18-20
http://cran.r-project.org/doc/Rnews/Rnews_2001-1.pdf

B. D. Ripley, R. M. Ripley,  Applications of R Clients
and Servers in Proceedings of the Distributed
Statistical Computing 2001 Workshop, 2001, Vienna
University of Technology.
http://www.ci.tuwien.ac.at/Conferences/DSC-2001/Proceedings/Ripley.pdf

Torsten Hothorn, David A. James, Brian D. Ripley,  R/S
Interfaces to Databases  in Proceedings of the
Distributed Statistical Computing 2001 Workshop,
2001,Vienna University of Technology.
http://www.ci.tuwien.ac.at/Conferences/DSC-2001/Proceedings/HothornJamesRipley.pdf

Lu??s Torgo, Data Mining with R. Learning by case
studies, Maggio 2003
http://www.liacc.up.pt/~ltorgo/DataMiningWithR/

Best
Vito

You wrote:

Hi,

how can I import really big plain text data files
(several GB) from an
ERP-System (SAP-Tables) to R?
The Header of these files are always similar, for
example:

Tabelle:        T009
Angezeigte Felder:  7 von  7  Feststehende
F??hrungsspalten: 2  Listbreite
0250
----------------------------------------------------------------------
|X|MANDT|PERIV|XKALE|XJABH|ANZBP|ANZSP|LTEXT          
              |
----------------------------------------------------------------------
|X|001  |01   |X    |     |012  |02   |ABC            
              |
|X|001  |V9   |     |     |012  |04   |Okt. - Sep., 4
Sonderperioden |
|X|001  |WK   |     |X    |053  |00   |Kalenderwochen 
              |
----------------------------------------------------------------------

(including the first 5 rows in each downloaded table,
row # 4 =field names,
length of 1 row > 1023 bytes, count of fields > 256,
size = several GB,
count records = several million)

What is an appropriate way to read such tables in?

Greetings
Stefan

P.S. I am a beginner with R. Until now I have used ACL
(http://www.acl.com)
for data mining purposes and I'm doing now my first
try with R.
Yes, I have
[X] Read R Data Import/Export
[X] Read Using R for Data Analysis
[X] Read Simple R
[X] Read Manuals
[X] Read read.table() and scan() command

====Diventare costruttori di soluzioni

"The business of the statistician is to catalyze 
the scientific learning process."  
George E. P. Box


Visitate il portale http://www.modugno.it/
e in particolare la sezione su Palese
http://www.modugno.it/archivio/cat_palese.shtml

r-help.20.stefan817@spamgourmet.com

2004-Oct-26 12:11 UTC

head link

[R] Importing big plain files from ERP-System/Data Mining with R

On Tue, 26 Oct 2004 r-help.20.stefan817 at spamgourmet.com wrote:
>> how can I import really big plain text data files (several GB) from an
>Unlikely unless you have a 64-bit platform.
Why? I have a 32-bit Win XP Platform running R 2.0.0. With ACL 8.21 e.g. 10 GB
were no problem.
>Only starting with R 2.0.0 can some 32-bit versions of R access files >
>2Gb, and to import the file into R you need enough address space in R for
>the object, which is normally more than the file size.
Is this really so? I want to summarize the data or calculate clusters, so only
the aggregated information should be in memory. Does R first import the whole
file and then calculate with it? In ACL the concept is to leave the file itself
on the harddisk, scanning it for each calculation and doing only the calculation
in memory. (Surely not very fast, but probably the only method for big files)
>Almost certainly not if the unmentioned platform is Windows, but you could 
>access the data from a DBMS.
I can do this also, but with several limitations.
>> ERP-System (SAP-Tables) to R?
>> The Header of these files are always similar, for example:
>> 
>> Tabelle:        T009
>> Angezeigte Felder:  7 von  7  Feststehende F??hrungsspalten: 2 
Listbreite
>> 0250
>> ----------------------------------------------------------------------
>> |X|MANDT|PERIV|XKALE|XJABH|ANZBP|ANZSP|LTEXT                         |
>> ----------------------------------------------------------------------
>> |X|001  |01   |X    |     |012  |02   |ABC                           |
>> |X|001  |V9   |     |     |012  |04   |Okt. - Sep., 4 Sonderperioden |
>> |X|001  |WK   |     |X    |053  |00   |Kalenderwochen                |
>> ----------------------------------------------------------------------
>> 
>> (including the first 5 rows in each downloaded table, row # 4 =field
>names,
>> length of 1 row > 1023 bytes, count of fields > 256, size =
several GB,
>> count records = several million)
>> 
>> What is an appropriate way to read such tables in?
Greetings
Stefan

R help - Oct 2004 - Importing big plain files from ERP-System/Data Mining with R

[R] Importing big plain files from ERP-System/Data Mining with R

[R] Importing big plain files from ERP-System/Data Mining with R

[R] Re: Importing big plain files from ERP-System/Data Mining with R

[R] Importing big plain files from ERP-System/Data Mining with R