thr3ads.net - R help - [R] Different behaviour of data() [Jan 2002]

If this information is useful, please help other people find it:
Share via:

Jan_Svatos@eurotel.cz

2002-Jan-03 11:16 UTC

[R] Different behaviour of data()

Dear List,

I frequently use the

data()

function to load csv files (with separator ";") into R session,
typically

data(myfile)

loads myfile.csv from my working/data directory into R.
Now, in 1.4.0 version, everything works as expected, but with one
difference:
The values readed in older versions in "num" mode are now readed as
"int"
mode,
converting the values larger than 2147483647 (2^{31}-1) into that value.

This has a consequence when reading such kind of data:

<example>

File
alerts.csv
looks like:

"IMSI";"DialedDigits";"Cnt";"Pri";"Dur"
"230020100010125";"+28491628975809";3;332;2391
"230020100010125";"+28491723744868";1;12;75
etc...
with first row being the colnames of resulting dataframe.

<R-1.3.1>
In 1.3.1 session:>data(alerts); str(alerts$IMSI)gives

num [1:2793] 2.3e+14 2.3e+14 2.3e+14 2.3e+14 2.3e+14 ...
>str(as.character(alerts$IMSI))gives
chr [1:2793] "230020100010125" "230020100010125"
"230020100010125" ...

and>n<-length(unique(alerts$IMSI)); ngives 125, (i.e. reads the data as they are)

</R-1.3.1>

<R-1.4.0>

while the same on 1.4.0 gives

int [1:2793] 2147483647  2147483647 2147483647 ...

and>n<-length(unique(alerts$IMSI)); ngives 1. (i.e. reflects the conversion of the data in int mode, which
destroys the info about
IMSI numbers, which are always 15 digit numbers)

</R-1.4.0>
</example>

I was unable to find in http://cran.r-project.org/src/base/NEWS
some comment to this new behaviour of data().
What I found was:

---
read.table() has new arguments `nrows' and `colClasses'.  If the
           latter is NA (the default), conversion is attempted to
           logical, integer, numeric or complex, not just to numeric
---

Should I use read.table() with colClasses specified (instead of data())?

Why not, but this involves lots of "hand-made" changes to my
R-scripts,
which is unpleasant and involves risk of some typos and so on.

Is there some more "systematic" way to solve this problem?
>version
platform i386-pc-mingw32
arch     x86
os       Win32
system   x86, Win32
status
major    1
minor    4.0
year     2001
month    12
day      19
language R

Thanks In Advance,
Jan

-------------------------------------------------
designed for _monospaced_ font
-------------------------------------------------
/- Jan Svatos,  PhD         Sokolovska 855/225 -/
/- Data Analyst,            Prague 9           -/
/- Eurotel Praha            190 00             -/
/- jan_svatos at eurotel.cz    Czechia            -/
-------------------------------------------------

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Prof Brian Ripley

2002-Jan-03 12:20 UTC

head link

[R] Different behaviour of data()

This is nothing to do with data().  data uses read.table to read .csv
files, and that *is* in its help file!

Also, these fields are not numeric nor integers but strings, so you can't
expect the standard methods to make sense of them. What `Writing R
Extensions' recommends you should do is to read them in once, correctly,
them dump them as .rda files. *Then* data() will work as you expected.
If you use compression the files might be much smaller, too.

I'm not clear why type.convert is not objecting to overflowing integers,
but that will depend on the implementation of strtol on your platform. We
might manage to improve it.  But in any case I think you ought to read
these fields as character.


On Thu, 3 Jan 2002 Jan_Svatos at eurotel.cz wrote:
> Dear List,
>
> I frequently use the
>
> data()
>
> function to load csv files (with separator ";") into R session,
> typically
>
> data(myfile)
>
> loads myfile.csv from my working/data directory into R.
> Now, in 1.4.0 version, everything works as expected, but with one
> difference:
> The values readed in older versions in "num" mode are now readed
as "int"
> mode,
> converting the values larger than 2147483647 (2^{31}-1) into that value.
>
> This has a consequence when reading such kind of data:
>
> <example>
>
> File
> alerts.csv
> looks like:
>
>
"IMSI";"DialedDigits";"Cnt";"Pri";"Dur"
> "230020100010125";"+28491628975809";3;332;2391
> "230020100010125";"+28491723744868";1;12;75
> etc...
> with first row being the colnames of resulting dataframe.
>
> <R-1.3.1>
> In 1.3.1 session:
> >data(alerts); str(alerts$IMSI)
> gives
>
> num [1:2793] 2.3e+14 2.3e+14 2.3e+14 2.3e+14 2.3e+14 ...
>
> >str(as.character(alerts$IMSI))
> gives
> chr [1:2793] "230020100010125" "230020100010125"
"230020100010125" ...
>
> and
> >n<-length(unique(alerts$IMSI)); n
> gives 125, (i.e. reads the data as they are)
>
> </R-1.3.1>
>
> <R-1.4.0>
>
> while the same on 1.4.0 gives
>
> int [1:2793] 2147483647  2147483647 2147483647 ...
>
> and
> >n<-length(unique(alerts$IMSI)); n
> gives 1. (i.e. reflects the conversion of the data in int mode, which
> destroys the info about
> IMSI numbers, which are always 15 digit numbers)
>
> </R-1.4.0>
> </example>
>
> I was unable to find in http://cran.r-project.org/src/base/NEWS
> some comment to this new behaviour of data().
> What I found was:
>
> ---
> read.table() has new arguments `nrows' and `colClasses'.  If the
>            latter is NA (the default), conversion is attempted to
>            logical, integer, numeric or complex, not just to numeric
> ---
>
> Should I use read.table() with colClasses specified (instead of data())?
>
> Why not, but this involves lots of "hand-made" changes to my
R-scripts,
> which is unpleasant and involves risk of some typos and so on.
>
> Is there some more "systematic" way to solve this problem?
>
> >version
>
> platform i386-pc-mingw32
> arch     x86
> os       Win32
> system   x86, Win32
> status
> major    1
> minor    4.0
> year     2001
> month    12
> day      19
> language R
>
> Thanks In Advance,
> Jan
>
> -------------------------------------------------
> designed for _monospaced_ font
> -------------------------------------------------
> /- Jan Svatos,  PhD         Sokolovska 855/225 -/
> /- Data Analyst,            Prague 9           -/
> /- Eurotel Praha            190 00             -/
> /- jan_svatos at eurotel.cz    Czechia            -/
> -------------------------------------------------
>
>
-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
> r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._
>
-- 
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Jan_Svatos@eurotel.cz

2002-Jan-03 12:46 UTC

head link

[R] Different behaviour of data()

Thanks to Prof. Ripley for quick and useful answer.
Yes, I will either transfrorm the data-acquiring tool to get the columns as
numbers, not character,
or read them as character, and then manage them with as.factor().

Jan


- - - Original message: - - -
From: Prof Brian Ripley <ripley at stats.ox.ac.uk>
Send: 1/3/02 1:20:11 PM
To: <Jan_Svatos at eurotel.cz> <r-help at stat.math.ethz.ch>
Subject: Re: [R] Different behaviour of data()

This is nothing to do with data().  data uses read.table to read .csv
files, and that *is* in its help file!

Also, these fields are not numeric nor integers but strings, so you can't
expect the standard methods to make sense of them. What `Writing R
Extensions' recommends you should do is to read them in once, correctly,
them dump them as .rda files. *Then* data() will work as you expected.
If you use compression the files might be much smaller, too.

I'm not clear why type.convert is not objecting to overflowing integers,
but that will depend on the implementation of strtol on your platform. We
might manage to improve it.  But in any case I think you ought to read
these fields as character.


On Thu, 3 Jan 2002 Jan_Svatos at eurotel.cz wrote:
> Dear List,
>
> I frequently use the
>
> data()
>
> function to load csv files (with separator ";") into R session,
> typically
>
> data(myfile)
>
> loads myfile.csv from my working/data directory into R.
> Now, in 1.4.0 version, everything works as expected, but with one
> difference:
> The values readed in older versions in "num" mode are now readed
as "int"
> mode,
> converting the values larger than 2147483647 (2^{31}-1) into that value.
>
> This has a consequence when reading such kind of data:
>
> <example>
>
> File
> alerts.csv
> looks like:
>
>
"IMSI";"DialedDigits";"Cnt";"Pri";"Dur"
> "230020100010125";"+28491628975809";3;332;2391
> "230020100010125";"+28491723744868";1;12;75
> etc...
> with first row being the colnames of resulting dataframe.
>
> <R-1.3.1>
> In 1.3.1 session:
> >data(alerts); str(alerts$IMSI)
> gives
>
> num [1:2793] 2.3e+14 2.3e+14 2.3e+14 2.3e+14 2.3e+14 ...
>
> >str(as.character(alerts$IMSI))
> gives
> chr [1:2793] "230020100010125" "230020100010125"
"230020100010125" ...
>
> and
> >n<-length(unique(alerts$IMSI)); n
> gives 125, (i.e. reads the data as they are)
>
> </R-1.3.1>
>
> <R-1.4.0>
>
> while the same on 1.4.0 gives
>
> int [1:2793] 2147483647  2147483647 2147483647 ...
>
> and
> >n<-length(unique(alerts$IMSI)); n
> gives 1. (i.e. reflects the conversion of the data in int mode, which
> destroys the info about
> IMSI numbers, which are always 15 digit numbers)
>
> </R-1.4.0>
> </example>
>
> I was unable to find in http://cran.r-project.org/src/base/NEWS
> some comment to this new behaviour of data().
> What I found was:
>
> ---
> read.table() has new arguments `nrows' and `colClasses'.  If the
>            latter is NA (the default), conversion is attempted to
>            logical, integer, numeric or complex, not just to numeric
> ---
>
> Should I use read.table() with colClasses specified (instead of data())?
>
> Why not, but this involves lots of "hand-made" changes to my
R-scripts,
> which is unpleasant and involves risk of some typos and so on.
>
> Is there some more "systematic" way to solve this problem?
>
> >version
>
> platform i386-pc-mingw32
> arch     x86
> os       Win32
> system   x86, Win32
> status
> major    1
> minor    4.0
> year     2001
> month    12
> day      19
> language R
>
> Thanks In Advance,
> Jan
>
> -------------------------------------------------
> designed for _monospaced_ font
> -------------------------------------------------
> /- Jan Svatos,  PhD         Sokolovska 855/225 -/
> /- Data Analyst,            Prague 9           -/
> /- Eurotel Praha            190 00             -/
> /- jan_svatos at eurotel.cz    Czechia            -/
> -------------------------------------------------
>
>-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
-.-.-> r-help mailing list -- Read
http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html> Send "info", "help", or "[un]subscribe"
> (in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
>_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._.
_._._>
--
Brian D. Ripley,                  ripley at stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272860 (secr)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595


-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-
r-help mailing list -- Read http://www.ci.tuwien.ac.at/~hornik/R/R-FAQ.html
Send "info", "help", or "[un]subscribe"
(in the "body", not the subject !)  To: r-help-request at
stat.math.ethz.ch
_._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._._

Maybe Matching Threads

Search for more maybe matching threads

R help - Jan 2002 - Different behaviour of data()

[R] Different behaviour of data()

[R] Different behaviour of data()

[R] Different behaviour of data()

Maybe Matching Threads