thr3ads.net - R devel - [Rd] read.table() and NULL for colClasses [Jul 2004]

If this information is useful, please help other people find it:
Share via:

Henrik Bengtsson

2004-Jul-28 21:11 UTC

[Rd] read.table() and NULL for colClasses

Hi,

is there are reason for not supporting NULL or "NULL" values for
argument
colClasses in read.table(), much like you can use NULL values for argument
'what' in scan()? This would help quite a bit when reading large data
files
where only a few columns are of interest. 

I've modfied read.table() to so it calls scan(what=...) also with NULLs for
the fields to be skipped. Here's the diff of readtable.R (from the
R-1.9.1.tgz; 9,591,217 bytes):

diff readtable.new.R readtable.R
117,123d116
<     # Skip NULL columns in scan()
<     void <- sapply(colClasses, FUN=identical, "NULL") |
<             sapply(colClasses, FUN=is.null)
<     # If all (data) columns are NULL, return empty data frame.
<     if (sum(!void) <= 1*rlabp)
<       return(data.frame())
<     what[void] <- list(NULL)
131c124
<     nlines <- length(data[[which(!void)[1]]])
--->     nlines <- length(data[[1]])161c154
<     for (i in (1:cols)[!known & !void]) {
--->     for (i in 1:cols) {171,178d163
<     # Skipped row names equals row.names=NULL.
<     if (rlabp) {
<       if (void[1]) {
<         row.names <- NULL
<         data <- data[-1]
<       }
<       void <- void[-1]
<     }
201,202d185
<     # Remove NULL columns
<     data[void] <- NULL

and a diff for read.table.Rd:

diff read.table.new.Rd read.table.Rd
102,104c102
<     \code{NA} when \code{\link{type.convert}} is used.  Columns for
<     which the value is \code{"NULL"} (or \code{NULL} in a list)
are
<     skipped. NB: \code{as} is
--->     \code{NA} when \code{\link{type.convert}} is used.  NB: \code{as} is181,183c179
<   the five atomic vector classes. Skipping columns with
\code{"NULL"}
<   (or \code{NULL} will also require less memory.
<
--->   the five atomic vector classes.
Note that there is already an, what I assume is unintentional, effect of
setting a colClasses to "NULL". The data conversion, which happens
*after*
scan() has read the data anyway, "NULL" will NULL a column via as(x,
"NULL"), but unfortunately the wrong column. If not the above
modifications,
maybe a warning for the latter?

Best wishes

Henrik Bengtsson

Dept. of Mathematical Statistics @ Centre for Mathematical Sciences 
Lund Institute of Technology/Lund University, Sweden (+2h UTC)
+46 46 2229611 (off), +46 708 909208 (cell), +46 46 2224623 (fax)
h b @ m a t h s . l t h . s e, http://www.maths.lth.se/~hb/

Prof Brian Ripley

2004-Jul-28 22:13 UTC

head link

[Rd] read.table() and NULL for colClasses

NULL is not a valid value for colClasses and I don't see why you thought
it was.  colClasses has to be character according to the documentation, so
"NULL" is allowed but not NULL.

Your diff appears to be backwards for a patch.  A patch against the 
current R-devel sources is what is needed, including some regression 
tests.

On Wed, 28 Jul 2004, Henrik Bengtsson wrote:
> Hi,
> 
> is there are reason for not supporting NULL or "NULL" values for
argument
> colClasses in read.table(), much like you can use NULL values for argument
> 'what' in scan()? This would help quite a bit when reading large
data files
> where only a few columns are of interest. 
Is that a common enough case to make this worth the code complication,
given that scan() (or better, a DBMS) can be used?  The usual reason is
that R is maintained by a small and overworked team and adding
complications needs justification, not not adding them.
> I've modfied read.table() to so it calls scan(what=...) also with NULLs
for
> the fields to be skipped. Here's the diff of readtable.R (from the
> R-1.9.1.tgz; 9,591,217 bytes):
> 
> diff readtable.new.R readtable.R
> 117,123d116
> <     # Skip NULL columns in scan()
> <     void <- sapply(colClasses, FUN=identical, "NULL") |
> <             sapply(colClasses, FUN=is.null)
> <     # If all (data) columns are NULL, return empty data frame.
> <     if (sum(!void) <= 1*rlabp)
> <       return(data.frame())
> <     what[void] <- list(NULL)
> 131c124
> <     nlines <- length(data[[which(!void)[1]]])
> ---
> >     nlines <- length(data[[1]])
> 161c154
> <     for (i in (1:cols)[!known & !void]) {
> ---
> >     for (i in 1:cols) {
> 171,178d163
> <     # Skipped row names equals row.names=NULL.
> <     if (rlabp) {
> <       if (void[1]) {
> <         row.names <- NULL
> <         data <- data[-1]
> <       }
> <       void <- void[-1]
> <     }
> 201,202d185
> <     # Remove NULL columns
> <     data[void] <- NULL
> 
> and a diff for read.table.Rd:
> 
> diff read.table.new.Rd read.table.Rd
> 102,104c102
> <     \code{NA} when \code{\link{type.convert}} is used.  Columns for
> <     which the value is \code{"NULL"} (or \code{NULL} in a
list) are
> <     skipped. NB: \code{as} is
> ---
> >     \code{NA} when \code{\link{type.convert}} is used.  NB: \code{as}
is
> 181,183c179
> <   the five atomic vector classes. Skipping columns with
\code{"NULL"}
> <   (or \code{NULL} will also require less memory.
> <
> ---
> >   the five atomic vector classes.
> 
> Note that there is already an, what I assume is unintentional, effect of
> setting a colClasses to "NULL". The data conversion, which
happens *after*
> scan() has read the data anyway, "NULL" will NULL a column via
as(x,
> "NULL"), but unfortunately the wrong column. If not the above
modifications,
> maybe a warning for the latter?
That's not usage as documented so the effect is definitely unintentional.
We can't catch all misuses!

-- 
Brian D. Ripley,                  ripley@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

Seemingly Similar Threads

Search for more seemingly similar threads

R devel - Jul 2004 - read.table() and NULL for colClasses

[Rd] read.table() and NULL for colClasses

[Rd] read.table() and NULL for colClasses

Seemingly Similar Threads