thr3ads.net - R help - [R] Conditional read-in of data [Nov 2009]

If this information is useful, please help other people find it:
Share via:

mnstn

2009-Nov-04 05:07 UTC

[R] Conditional read-in of data

Hello All,
I have a 40k rows long data set that is taking a lot of time to be read-in.
Is there a way to skip reading even/odd numbered rows or read-in only rows
that are multiples of, say, 10? This way I get the general trend of the data
w/o actually reading the entire thing. The option 'skip' in read.table
simply skips the first n rows and reads the rest. I do understand that once
the full data set (40k rows) is read-in, I can manipulate the data. But the
bottle-neck here is the first read/scan of data.

I searched in the forum using key words (conditional skip/skip reading
rows/skip data/conditional data read) etc. but couldn't find relevant
conversations. I apologize if this has already been discussed since it does
seem hard to imagine that nobody has come across this problem yet.

Any suggestions/comments are welcome.
Thanks,
mnstn
-- 
View this message in context:
http://old.nabble.com/Conditional-read-in-of-data-tp26191091p26191091.html
Sent from the R help mailing list archive at Nabble.com.

jim holtman

2009-Nov-04 12:56 UTC

head link

[R] Conditional read-in of data

That does not seem like a "large" data set.  How are you reading it?
How many columns does it have?  What is "a lot of time" by your
definition?  You have provided minimal data for obtaining help.  I
common read in files with 300K rows in under 30 seconds.  Maybe you
need to consider a relational database for storing your data.

On Wed, Nov 4, 2009 at 12:07 AM, mnstn <pavan.namd at gmail.com>
wrote:>
> Hello All,
> I have a 40k rows long data set that is taking a lot of time to be read-in.
> Is there a way to skip reading even/odd numbered rows or read-in only rows
> that are multiples of, say, 10? This way I get the general trend of the
data
> w/o actually reading the entire thing. The option 'skip' in
read.table
> simply skips the first n rows and reads the rest. I do understand that once
> the full data set (40k rows) is read-in, I can manipulate the data. But the
> bottle-neck here is the first read/scan of data.
>
> I searched in the forum using key words (conditional skip/skip reading
> rows/skip data/conditional data read) etc. but couldn't find relevant
> conversations. I apologize if this has already been discussed since it does
> seem hard to imagine that nobody has come across this problem yet.
>
> Any suggestions/comments are welcome.
> Thanks,
> mnstn
> --
> View this message in context:
http://old.nabble.com/Conditional-read-in-of-data-tp26191091p26191091.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

Gabor Grothendieck

2009-Nov-04 13:15 UTC

head link

[R] Conditional read-in of data

1. You can pipe your data through gawk (or other scripting language)
process as in:
http://tolstoy.newcastle.edu.au/R/e5/help/08/09/2129.html

2. read.csv.sql in the sqldf package on CRAN will set up a database
for you, read the file into the database automatically defining the
layout of the table, extract a portion into R based on an sql
statement that you provide and then destroy the database all in one
statement.  It uses the sqlite database which is included in the
RSQLite R package that it depends on so there is nothing to separately
install.
See ?read.csv.sql in the package and also see example 13 on the home page:
http://sqldf.googlecode.com

On Wed, Nov 4, 2009 at 12:07 AM, mnstn <pavan.namd at gmail.com>
wrote:>
> Hello All,
> I have a 40k rows long data set that is taking a lot of time to be read-in.
> Is there a way to skip reading even/odd numbered rows or read-in only rows
> that are multiples of, say, 10? This way I get the general trend of the
data
> w/o actually reading the entire thing. The option 'skip' in
read.table
> simply skips the first n rows and reads the rest. I do understand that once
> the full data set (40k rows) is read-in, I can manipulate the data. But the
> bottle-neck here is the first read/scan of data.
>
> I searched in the forum using key words (conditional skip/skip reading
> rows/skip data/conditional data read) etc. but couldn't find relevant
> conversations. I apologize if this has already been discussed since it does
> seem hard to imagine that nobody has come across this problem yet.
>
> Any suggestions/comments are welcome.
> Thanks,
> mnstn
> --
> View this message in context:
http://old.nabble.com/Conditional-read-in-of-data-tp26191091p26191091.html
> Sent from the R help mailing list archive at Nabble.com.
>
> ______________________________________________
> R-help at r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

Possibly Parallel Threads

Search for more apparently analagous threads

R help - Nov 2009 - Conditional read-in of data

[R] Conditional read-in of data

[R] Conditional read-in of data

[R] Conditional read-in of data

Possibly Parallel Threads