thr3ads.net - R help - [R] Manage huge database [Sep 2008]

If this information is useful, please help other people find it:
Share via:

José E. Lozano

2008-Sep-22 06:50 UTC

[R] Manage huge database

Hello,

 

Recently I have been trying to open a huge database with no success.

 

It’s a 4GB csv plain text file with around 2000 rows and over 500,000
columns/variables.

 

I have try with The SAS System, but it reads only around 5000 columns, no
more. R hangs up when opening.

 

Is there any way to work with “parts” (a set of columns) of this database,
since its impossible to manage it all at once?

 

Is there any way to establish a link to the csv file and to state the
columns you want to fetch every time you make an analysis?

 

I’ve been searching the net, but found little about this topic.

 

Best regards,

Jose Lozano


	[[alternative HTML version deleted]]

Barry Rowlingson

2008-Sep-22 07:08 UTC

head link

[R] Manage huge database

2008/9/22 Jos? E. Lozano <lozalojo at jcyl.es>:
> Recently I have been trying to open a huge database with no success.
>
> It's a 4GB csv plain text file with around 2000 rows and over 500,000
> columns/variables.
 I wouldn't call a 4GB csv text file a 'database'.
> Is there any way to work with "parts" (a set of columns) of this
database,
> since its impossible to manage it all at once?
 Yes, use a database. A real database.
> Is there any way to establish a link to the csv file and to state the
> columns you want to fetch every time you make an analysis?
 No, but you can establish a link to a database. You want a database.
A real relational database.
> I've been searching the net, but found little about this topic.
Try:
http://cran.r-project.org/doc/manuals/R-data.html#Relational-databases

Barry

Yihui Xie

2008-Sep-22 07:35 UTC

head link

[R] Manage huge database

Hi,

You can treat it as a database and use ODBC to fetch data from the CSV
file using SQL. See the package RODBC for details about database
connections. (I have dealt with similar problems before with RODBC)

Regards,
Yihui
--
Yihui Xie <xieyihui at gmail.com>
Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086
Mobile: +86-15810805877
Homepage: http://www.yihui.name
School of Statistics, Room 1037, Mingde Main Building,
Renmin University of China, Beijing, 100872, China



On Mon, Sep 22, 2008 at 2:50 PM, Jos? E. Lozano <lozalojo at jcyl.es>
wrote:> Hello,
>
>
>
> Recently I have been trying to open a huge database with no success.
>
>
>
> It's a 4GB csv plain text file with around 2000 rows and over 500,000
> columns/variables.
>
>
>
> I have try with The SAS System, but it reads only around 5000 columns, no
> more. R hangs up when opening.
>
>
>
> Is there any way to work with "parts" (a set of columns) of this
database,
> since its impossible to manage it all at once?
>
>
>
> Is there any way to establish a link to the csv file and to state the
> columns you want to fetch every time you make an analysis?
>
>
>
> I've been searching the net, but found little about this topic.
>
>
>
> Best regards,
>
> Jose Lozano
>
>
>        [[alternative HTML version deleted]]
>

José E. Lozano

2008-Sep-22 07:49 UTC

head link

[R] Manage huge database

Hello, Yihui
> You can treat it as a database and use ODBC to fetch data from the CSV
> file using SQL. See the package RODBC for details about database
> connections. (I have dealt with similar problems before with RODBC)
Thanks for your tip, I have used RODBC before to read data from MSAccess and
MSExcel files, but never I imagined it could work for non-database files
such as csv.

I will check the RODBC documentation.

Best Regards,
Jose Lozano

------------------------------------------
Jose E. Lozano Alonso
Observatorio de Salud P?blica.
Direccion General de Salud P?blica e I+D+I.
Junta de Castilla y Le?n.
Direccion: Paseo de Zorrilla, n?1. Despacho 3103. CP 47071. Valladolid.

Gabor Grothendieck

2008-Sep-22 16:52 UTC

head link

[R] Manage huge database

Try this:

read.table(pipe("/Rtools/bin/gawk -f cut.awk bigdata.dat"))

where cut.awk contains the single line (assuming you
want fields 101 through 110 and none other):

{ for(i = 101; i <= 110; i++) printf("%s ", $i); printf
"\n" }

or just use cut.  I tried the gawk command above on Windows
Vista with an artificial file of 500,000 columns and 2 rows and it seemed
instantaneous.

On Windows the above uses gawk from Rtools available at:
   http://www.murdoch-sutherland.com/Rtools/
or you can separately install gawk.  Rtools also has cut if you
prefer that.

On Mon, Sep 22, 2008 at 2:50 AM, Jos? E. Lozano <lozalojo at jcyl.es>
wrote:> Hello,
>
>
>
> Recently I have been trying to open a huge database with no success.
>
>
>
> It's a 4GB csv plain text file with around 2000 rows and over 500,000
> columns/variables.
>
>
>
> I have try with The SAS System, but it reads only around 5000 columns, no
> more. R hangs up when opening.
>
>
>
> Is there any way to work with "parts" (a set of columns) of this
database,
> since its impossible to manage it all at once?
>
>
>
> Is there any way to establish a link to the csv file and to state the
> columns you want to fetch every time you make an analysis?
>
>
>
> I've been searching the net, but found little about this topic.
>
>
>
> Best regards,
>
> Jose Lozano
>
>
>        [[alternative HTML version deleted]]
>
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
>

Maybe Matching Threads

Search for more apparently analagous threads

R help - Sep 2008 - Manage huge database

[R] Manage huge database

[R] Manage huge database

[R] Manage huge database

[R] Manage huge database

[R] Manage huge database

Maybe Matching Threads