David Vonka
2006-Jul-12 12:37 UTC
[R] Is it possible to only read a subset by read.table ?
Hello, is it possible to do something like DATA <- read.table(file="blabla.dat",subset=(sex=="male")), i.e. make R read only a subset of a csv file ? I think it would be useful in case of very big datasets, but I can't find such a feature. Thanks for an answer, David Vonka
Gabor Grothendieck
2006-Jul-12 12:49 UTC
[R] Is it possible to only read a subset by read.table ?
You can use pipe with read.table as in: http://tolstoy.newcastle.edu.au/R/help/06/02/20379.html Also note skip= and nrows= arguments to read.table. On 7/12/06, David Vonka <D.Vonka at uvt.nl> wrote:> Hello, > > is it possible to do something like > > DATA <- read.table(file="blabla.dat",subset=(sex=="male")), > > i.e. make R read only a subset of a csv file ? > I think it would be useful in case of very big datasets, > but I can't find such a feature. > > Thanks for an answer, > David Vonka > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >
It's not so straightforward as that, but you could construct something with readLines(). -roger On 7/12/06, David Vonka <D.Vonka at uvt.nl> wrote:> Hello, > > is it possible to do something like > > DATA <- read.table(file="blabla.dat",subset=(sex=="male")), > > i.e. make R read only a subset of a csv file ? > I think it would be useful in case of very big datasets, > but I can't find such a feature. > > Thanks for an answer, > David Vonka > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! http://www.R-project.org/posting-guide.html >-- Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/
Prof Brian Ripley
2006-Jul-12 13:54 UTC
[R] Is it possible to only read a subset by read.table ?
On Wed, 12 Jul 2006, David Vonka wrote:> Hello, > > is it possible to do something like > > DATA <- read.table(file="blabla.dat",subset=(sex=="male")), > > i.e. make R read only a subset of a csv file ? > I think it would be useful in case of very big datasets, > but I can't find such a feature.No. It is possible to read only some columns: see colClasses. It is not clear that the ability to skip based on something defined on other columns would actually be useful, especially as you can read datasets in blocks, process the blocks and combine the results. -- Brian D. Ripley, ripley at stats.ox.ac.uk Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595
bogdan romocea
2006-Jul-12 16:03 UTC
[R] Is it possible to only read a subset by read.table ?
It's possible and straightforward (just don't use R). IMHO the GNU Core Utilities http://www.gnu.org/software/coreutils/ plus a few other tools such as sed, awk, grep etc are much more appropriate than R for processing massive text files. (Get a good book about UNIX shell scripting. On Windows you can use Services For Unix or Cygwin.) Also, here's an example that you could adapt to print the males from your data set to a separate file, which you could then import in R. #---print specific lines to another file--- suffix=_JAN06 for F in `ls *data*` do echo $F sed -n -e '/2006-01-[0-9][0-9]/p' $F > ${F}${suffix} done> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of David Vonka > Sent: Wednesday, July 12, 2006 8:37 AM > To: r-help at stat.math.ethz.ch > Subject: [R] Is it possible to only read a subset by read.table ? > > Hello, > > is it possible to do something like > > DATA <- read.table(file="blabla.dat",subset=(sex=="male")), > > i.e. make R read only a subset of a csv file ? > I think it would be useful in case of very big datasets, > but I can't find such a feature. > > Thanks for an answer, > David Vonka > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide! > http://www.R-project.org/posting-guide.html >
Possibly Parallel Threads
- Very slow read.table on Linux, compared to Win2000
- extract required data from already read data
- custom subset method / handling columns selection as logic in '...' parameter
- Frequency and summary statistics table with different variables and categories
- percent by subset