Hello, Recently I have been trying to open a huge database with no success. It’s a 4GB csv plain text file with around 2000 rows and over 500,000 columns/variables. I have try with The SAS System, but it reads only around 5000 columns, no more. R hangs up when opening. Is there any way to work with “parts” (a set of columns) of this database, since its impossible to manage it all at once? Is there any way to establish a link to the csv file and to state the columns you want to fetch every time you make an analysis? I’ve been searching the net, but found little about this topic. Best regards, Jose Lozano [[alternative HTML version deleted]]
2008/9/22 Jos? E. Lozano <lozalojo at jcyl.es>:> Recently I have been trying to open a huge database with no success. > > It's a 4GB csv plain text file with around 2000 rows and over 500,000 > columns/variables.I wouldn't call a 4GB csv text file a 'database'.> Is there any way to work with "parts" (a set of columns) of this database, > since its impossible to manage it all at once?Yes, use a database. A real database.> Is there any way to establish a link to the csv file and to state the > columns you want to fetch every time you make an analysis?No, but you can establish a link to a database. You want a database. A real relational database.> I've been searching the net, but found little about this topic.Try: http://cran.r-project.org/doc/manuals/R-data.html#Relational-databases Barry
Hi, You can treat it as a database and use ODBC to fetch data from the CSV file using SQL. See the package RODBC for details about database connections. (I have dealt with similar problems before with RODBC) Regards, Yihui -- Yihui Xie <xieyihui at gmail.com> Phone: +86-(0)10-82509086 Fax: +86-(0)10-82509086 Mobile: +86-15810805877 Homepage: http://www.yihui.name School of Statistics, Room 1037, Mingde Main Building, Renmin University of China, Beijing, 100872, China On Mon, Sep 22, 2008 at 2:50 PM, Jos? E. Lozano <lozalojo at jcyl.es> wrote:> Hello, > > > > Recently I have been trying to open a huge database with no success. > > > > It's a 4GB csv plain text file with around 2000 rows and over 500,000 > columns/variables. > > > > I have try with The SAS System, but it reads only around 5000 columns, no > more. R hangs up when opening. > > > > Is there any way to work with "parts" (a set of columns) of this database, > since its impossible to manage it all at once? > > > > Is there any way to establish a link to the csv file and to state the > columns you want to fetch every time you make an analysis? > > > > I've been searching the net, but found little about this topic. > > > > Best regards, > > Jose Lozano > > > [[alternative HTML version deleted]] >
Hello, Yihui> You can treat it as a database and use ODBC to fetch data from the CSV > file using SQL. See the package RODBC for details about database > connections. (I have dealt with similar problems before with RODBC)Thanks for your tip, I have used RODBC before to read data from MSAccess and MSExcel files, but never I imagined it could work for non-database files such as csv. I will check the RODBC documentation. Best Regards, Jose Lozano ------------------------------------------ Jose E. Lozano Alonso Observatorio de Salud P?blica. Direccion General de Salud P?blica e I+D+I. Junta de Castilla y Le?n. Direccion: Paseo de Zorrilla, n?1. Despacho 3103. CP 47071. Valladolid.
Try this: read.table(pipe("/Rtools/bin/gawk -f cut.awk bigdata.dat")) where cut.awk contains the single line (assuming you want fields 101 through 110 and none other): { for(i = 101; i <= 110; i++) printf("%s ", $i); printf "\n" } or just use cut. I tried the gawk command above on Windows Vista with an artificial file of 500,000 columns and 2 rows and it seemed instantaneous. On Windows the above uses gawk from Rtools available at: http://www.murdoch-sutherland.com/Rtools/ or you can separately install gawk. Rtools also has cut if you prefer that. On Mon, Sep 22, 2008 at 2:50 AM, Jos? E. Lozano <lozalojo at jcyl.es> wrote:> Hello, > > > > Recently I have been trying to open a huge database with no success. > > > > It's a 4GB csv plain text file with around 2000 rows and over 500,000 > columns/variables. > > > > I have try with The SAS System, but it reads only around 5000 columns, no > more. R hangs up when opening. > > > > Is there any way to work with "parts" (a set of columns) of this database, > since its impossible to manage it all at once? > > > > Is there any way to establish a link to the csv file and to state the > columns you want to fetch every time you make an analysis? > > > > I've been searching the net, but found little about this topic. > > > > Best regards, > > Jose Lozano > > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >