Hello All, I have a 40k rows long data set that is taking a lot of time to be read-in. Is there a way to skip reading even/odd numbered rows or read-in only rows that are multiples of, say, 10? This way I get the general trend of the data w/o actually reading the entire thing. The option 'skip' in read.table simply skips the first n rows and reads the rest. I do understand that once the full data set (40k rows) is read-in, I can manipulate the data. But the bottle-neck here is the first read/scan of data. I searched in the forum using key words (conditional skip/skip reading rows/skip data/conditional data read) etc. but couldn't find relevant conversations. I apologize if this has already been discussed since it does seem hard to imagine that nobody has come across this problem yet. Any suggestions/comments are welcome. Thanks, mnstn -- View this message in context: http://old.nabble.com/Conditional-read-in-of-data-tp26191091p26191091.html Sent from the R help mailing list archive at Nabble.com.
That does not seem like a "large" data set. How are you reading it? How many columns does it have? What is "a lot of time" by your definition? You have provided minimal data for obtaining help. I common read in files with 300K rows in under 30 seconds. Maybe you need to consider a relational database for storing your data. On Wed, Nov 4, 2009 at 12:07 AM, mnstn <pavan.namd at gmail.com> wrote:> > Hello All, > I have a 40k rows long data set that is taking a lot of time to be read-in. > Is there a way to skip reading even/odd numbered rows or read-in only rows > that are multiples of, say, 10? This way I get the general trend of the data > w/o actually reading the entire thing. The option 'skip' in read.table > simply skips the first n rows and reads the rest. I do understand that once > the full data set (40k rows) is read-in, I can manipulate the data. But the > bottle-neck here is the first read/scan of data. > > I searched in the forum using key words (conditional skip/skip reading > rows/skip data/conditional data read) etc. but couldn't find relevant > conversations. I apologize if this has already been discussed since it does > seem hard to imagine that nobody has come across this problem yet. > > Any suggestions/comments are welcome. > Thanks, > mnstn > -- > View this message in context: http://old.nabble.com/Conditional-read-in-of-data-tp26191091p26191091.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >-- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve?
1. You can pipe your data through gawk (or other scripting language) process as in: http://tolstoy.newcastle.edu.au/R/e5/help/08/09/2129.html 2. read.csv.sql in the sqldf package on CRAN will set up a database for you, read the file into the database automatically defining the layout of the table, extract a portion into R based on an sql statement that you provide and then destroy the database all in one statement. It uses the sqlite database which is included in the RSQLite R package that it depends on so there is nothing to separately install. See ?read.csv.sql in the package and also see example 13 on the home page: http://sqldf.googlecode.com On Wed, Nov 4, 2009 at 12:07 AM, mnstn <pavan.namd at gmail.com> wrote:> > Hello All, > I have a 40k rows long data set that is taking a lot of time to be read-in. > Is there a way to skip reading even/odd numbered rows or read-in only rows > that are multiples of, say, 10? This way I get the general trend of the data > w/o actually reading the entire thing. The option 'skip' in read.table > simply skips the first n rows and reads the rest. I do understand that once > the full data set (40k rows) is read-in, I can manipulate the data. But the > bottle-neck here is the first read/scan of data. > > I searched in the forum using key words (conditional skip/skip reading > rows/skip data/conditional data read) etc. but couldn't find relevant > conversations. I apologize if this has already been discussed since it does > seem hard to imagine that nobody has come across this problem yet. > > Any suggestions/comments are welcome. > Thanks, > mnstn > -- > View this message in context: http://old.nabble.com/Conditional-read-in-of-data-tp26191091p26191091.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >