Hi I have a question, as im not able to import a csv file which contains a big dataset(100.000 records) someone knows how many records R can handle without giving problems? What im facing when i try to import the file is that R generates more than 100.000 records and is very slow... thanks a lot!!!
I do not know what is the limit for R. But on your problem you may try this: - Install MySQL server (download somewhere on www.mysql.com) - From inside MySQL you may import that CSV into a MySQL table - Then using RMySQL or ROBDC you will choose the Fields to use and import them to R. Good luck Caveman On Sat, Mar 27, 2010 at 11:19 AM, n.vialma at libero.it <n.vialma at libero.it> wrote:> Hi I have a question, > as im not able to import a csv file which contains a big dataset(100.000 records) someone knows how many records R can handle without giving problems? > What im facing when i try to import the file is that R generates more than 100.000 records and is very slow... > thanks a lot!!! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Databases, Data Analysis and OpenSource Software Consultant CENFOSS (www.cenfoss.co.mz) email: orvaquim at cenfoss.co.mz cell: +258828810980
Am 27.03.2010 10:19, schrieb n.vialma at libero.it:> What im facing when i try to import the file is that R generates more than 100.000 records and is very slow... > thanks a lot!!! > >Maybe your physical memory is too limited. R uses this and if your data are to large Linux and windows start to use the swap file which slows not only R but your computer down. hth Stefan
A little more information would help, such as the number of columns? I imagine it must be large, because 100,000 rows isn't overwhelming. Second, does the read.csv() fail, or does it work but only after a long time? And third, how much RAM do you have available? R Core provides some guidelines in the Installation and Administration documentation that suggests that a single object around 10% of your RAM is reasonable, but beyond that things can become challenging, particularly once you start working with your data. There are a wide range of packages to help with large data sets. For example, RMySQL supports MySQL databases. At the other end of the spectrum, there are possibilities discussed on a nice page by Dirk Eddelbuettel which you might look at: http://cran.r-project.org/web/views/HighPerformanceComputing.html Jay -- John W. Emerson (Jay) Associate Professor of Statistics Department of Statistics Yale University http://www.stat.yale.edu/~jay (original message below) ------------------------------ Message: 128 Date: Sat, 27 Mar 2010 10:19:33 +0100 From: "n\.vialma\@libero\.it" <n.vialma@libero.it> To: "r-help" <r-help@r-project.org> Subject: [R] large dataset Message-ID: <KZXOKL$991AA2D6C95C3BD9F464C3B32B78BE07@libero.it> Content-Type: text/plain; charset=iso-8859-1 Hi I have a question, as im not able to import a csv file which contains a big dataset(100.000 records) someone knows how many records R can handle without giving problems? What im facing when i try to import the file is that R generates more than 100.000 records and is very slow... thanks a lot!!! [[alternative HTML version deleted]]
Try using read.csv.sql in sqldf See example 13 on the sqldf home page: http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql Also read ?read.csv.sql On Sat, Mar 27, 2010 at 5:19 AM, n.vialma at libero.it <n.vialma at libero.it> wrote:> Hi I have a question, > as im not able to import a csv file which contains a big dataset(100.000 records) someone knows how many records R can handle without giving problems? > What im facing when i try to import the file is that R generates more than 100.000 records and is very slow... > thanks a lot!!! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
This was *very* useful for me when I dealt with a 1.5Gb text file http://www.csc.fi/sivut/atcsc/arkisto/atcsc3_2007/ohjelmistot_html/R_and_large_data/ On Sat, Mar 27, 2010 at 5:19 AM, n.vialma at libero.it <n.vialma at libero.it> wrote:> Hi I have a question, > as im not able to import a csv file which contains a big dataset(100.000 records) someone knows how many records R can handle without giving problems? > What im facing when i try to import the file is that R generates more than 100.000 records and is very slow... > thanks a lot!!! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
>This was *very* useful for me when I dealt with a 1.5Gb text file >http://www.csc.fi/sivut/atcsc/arkisto/atcsc3_2007/ohjelmistot_html/R_and_large_data/ Two hours is a *very* long time to transfer a csv file to a db. The author of the linked article has not documented how to use scan() arguments appropriately for the task. I take particular issue with the authors statement that "R is said to be slow, memory hungry and only capable of handling small datasets," indicating he/she has crummy informants and not challenged the notion him/herself. n.vialma, 100,000 records is likely not a lot of data. If it is taking more than two or three minutes, something is wrong. Knowing the record limits in R is a good starting point, but will only get you part of the way. How many records does your file contain? Do you know how to find out? What are the data types of the records? What is the call you are using to import the records into R? What OS are you using? How much RAM does your system have? What is the size of the R-environment on your system? Do you have resource intensive applications running (such as MS-Office)? A lot of folks on this list have been through what you are now dealing with, so there is plenty of help. I find myself smiling inside & wanting to say "welcome!" Sincerely, KeithC. -----Original Message----- From: Khanh Nguyen [mailto:nguyen.h.khanh at gmail.com] Sent: Saturday, March 27, 2010 8:59 AM To: n.vialma at libero.it Cc: r-help Subject: Re: [R] large dataset This was *very* useful for me when I dealt with a 1.5Gb text file http://www.csc.fi/sivut/atcsc/arkisto/atcsc3_2007/ohjelmistot_html/R_and_lar ge_data/ On Sat, Mar 27, 2010 at 5:19 AM, n.vialma at libero.it <n.vialma at libero.it> wrote:> Hi I have a question, > as im not able to import a csv file which contains a big dataset(100.000records) someone knows how many records R can handle without giving problems?> What im facing when i try to import the file is that R generates more than100.000 records and is very slow...> thanks a lot!!! > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html> and provide commented, minimal, self-contained, reproducible code. >
On Sat, Mar 27, 2010 at 4:19 AM, n.vialma at libero.it <n.vialma at libero.it> wrote:> Hi I have a question, > as im not able to import a csv file which contains a big dataset(100.000 records) someone knows how many records R can handle without giving problems? > What im facing when i try to import the file is that R generates more than 100.000 records and is very slow... > thanks a lot!!!Did you read the sections of the "R Data Import/Export" manual (check the Help menu item under manuals) relating to reading large data sets? There are many things you can do to make read.csv faster on files with a large number of records. You can pre-specify the number of records so that vectors are not continually being resized and, probably the most important, you can specify the column types. You can read the whole manual in less than the R is taking to read the file.