thr3ads.net - R help - [R] read.table for a subset of data [Mar 2007]

If this information is useful, please help other people find it:
Share via:

gnv shqp

2007-Mar-11 23:13 UTC

[R] read.table for a subset of data

Hi R-experts,

I have data from four conditions of an experiment.  I tried to create four
subsets of the data with read.table, for example,
read.table("Experiment.csv",subset=(condition=="1"))
.  I found a similar post in the archive, but the answer to that post was
no.   Any  new ideas about  reading subsets of data with read.table?

Thanks!

Feng

	[[alternative HTML version deleted]]

Wensui Liu

2007-Mar-11 23:40 UTC

head link

[R] read.table for a subset of data

as far as I've know, I don't think you can do so with read.table. But
I am also thinking about RODBC and wondering if you could assign a DSN
to your .csv file and then use sql to fetch the subset.

On 3/11/07, gnv shqp <gnvshqp at gmail.com> wrote:> Hi R-experts,
>
> I have data from four conditions of an experiment.  I tried to create four
> subsets of the data with read.table, for example,
> read.table("Experiment.csv",subset=(condition=="1"))
> .  I found a similar post in the archive, but the answer to that post was
> no.   Any  new ideas about  reading subsets of data with read.table?
>
> Thanks!
>
> Feng
>
>         [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

-- 
WenSui Liu
A lousy statistician who happens to know a little programming
(http://spaces.msn.com/statcompute/blog)

jim holtman

2007-Mar-12 02:16 UTC

head link

[R] read.table for a subset of data

Why cann't you read in the whole data set and then create the subsets?  This
is easily done with 'split'.  If the data is too large, then consider a
data
base.

On 3/11/07, gnv shqp <gnvshqp@gmail.com> wrote:>
> Hi R-experts,
>
> I have data from four conditions of an experiment.  I tried to create four
> subsets of the data with read.table, for example,
> read.table("Experiment.csv",subset=(condition=="1"))
> .  I found a similar post in the archive, but the answer to that post was
> no.   Any  new ideas about  reading subsets of data with read.table?
>
> Thanks!
>
> Feng
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@stat.math.ethz.ch mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>


-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem you are trying to solve?

	[[alternative HTML version deleted]]

Thaden, John J

2007-Mar-12 15:33 UTC

head link

[R] read.table for a subset of data

Feng,
   I had the same question as you, how to read a subset of data, and the same
reaction as Wensui when I discovered that read.table could not.  Even if my
computer's memory were up to it, I am troubled by the idea of reading in 1.8
GB of data (in my case) to get just 4,000 numbers, for instance, particularly
if I'm then going to iterating through the entire dataset in 4,000-number
chunks.  
   I ended up defining a NetCDF format to hold my data using the RNetCDF
package, since that package's var.get.nc() function is perfectly able to
read
subsets of a NetCDF variable.  Furthermore, NetCDF files allow data to be
matrices and even higher order arrays, from which you can then retrieve any
chunk by including var.get.nc 'start' and 'count' arguments in
the form of
vectors of length equal to the number of array dimensions.  Once a NetCDF
format is defined, all else is painless.  One limitation is that the RNetCDF
package only supports version 3 of the NetCDF library, a version that puts a
2 GB limit on a variable's size.  Version 4 removes this limitation; I'm
hopeful some day that an R package will be an interface to the NetCDF version
4 library.
John Thaden

Message: 22
Date: Sun, 11 Mar 2007 21:33:04 -0500
From: "jim holtman" <jholtman at gmail.com>
Subject: Re: [R] read.table for a subset of data
To: "Wensui Liu" <liuwensui at gmail.com>
Cc: r-help <r-help at stat.math.ethz.ch>
Message-ID:
	<644e1f320703111933g3e5cec0l16b485f2fc0a3dbb at mail.gmail.com>
Content-Type: text/plain

If you know what 10 rows to read, then you can 'skip' to them, but it
the
system still has to read each line at a time.

I have a 200,000 line csv file of numerics that takes me 4 seconds to read
in with 'read.csv' using 'colClasses', so I would guess your
100K line file
would take half of that.  Is 2 seconds of time a waste of resources?

On 3/11/07, Wensui Liu <liuwensui at gmail.com>
wrote:>
> Jim,
>
> Glad to see your reply.
>
> Refering to your email, what if I just want to read 10 rows from a csv
> table with 100000 rows? Do you think it a waste of resource to read
> the whole table in?
> Anything thought?
>
> wensui
>
> On 3/11/07, jim holtman <jholtman at gmail.com> wrote:
> > Why cann't you read in the whole data set and then create the
> subsets?  This
> > is easily done with 'split'.  If the data is too large, then
consider a
> data
> > base.
> >
> > On 3/11/07, gnv shqp <gnvshqp at gmail.com> wrote:
> > >
> > > Hi R-experts,
> > >
> > > I have data from four conditions of an experiment.  I tried to
create
> four
> > > subsets of the data with read.table, for example,
> > >
read.table("Experiment.csv",subset=(condition=="1"))
> > > .  I found a similar post in the archive, but the answer to that
post
> was
> > > no.   Any  new ideas about  reading subsets of data with
read.table?
> > >
> > > Thanks!
> > >
> > > Feng
> > >
> > >        [[alternative HTML version deleted]]
> > >
Confidentiality Notice: This e-mail message, including any a...{{dropped}}

Reasonably Related Threads

Search for more seemingly similar threads

R help - Mar 2007 - read.table for a subset of data

[R] read.table for a subset of data

[R] read.table for a subset of data

[R] read.table for a subset of data

[R] read.table for a subset of data

Reasonably Related Threads