thr3ads.net - R help - [R] More difficulties in getting data into R [Jul 2004]

If this information is useful, please help other people find it:
Share via:

Ajay Shah

2004-Jul-05 10:58 UTC

[R] More difficulties in getting data into R

In order to get around the problems of my posting a few minutes ago, I
thought:

$ awk -F\| '(NR > 2) {print $2}' cmie_firm_data.text > col2
$ awk -F\| '(NR > 2) {print $4}' cmie_firm_data.text > col4
$ paste col2 col4 | head -2
-510.45 -510.27
60700   101900
$ paste col2 col4 | tail -2
28648.12        31617.02
491014.77       494308.52
$ wc -l col2 col4
  89323 col2
  89323 col4
 178646 total

So all is well.

But R doesn't like it:

$ R --vanilla < picture.R 

R : Copyright 2004, The R Foundation for Statistical Computing
Version 1.9.1  (2004-06-21), ISBN 3-900051-00-3
> col2 <- read.table(file="col2")
> col4 <- read.table(file="col4")
> print(nrow(col2))
[1] 89323> print(nrow(col4))[1] 88746

Why might I be getting 89,323 and 88,746 obs for two files which `wc'
believes are each 89,323 lines long?

I checked, and there is no single quote or C-m in either file.

-- 
Ajay Shah                                                   Consultant
ajayshah at mayin.org                      Department of Economic Affairs
http://www.mayin.org/ajayshah           Ministry of Finance, New Delhi

Liaw, Andy

2004-Jul-06 12:33 UTC

head link

[R] More difficulties in getting data into R

Could it be that you happen you have `#' in `col4'?  Try either (or
both):

1. read.table(..., comment.char="")
2. scan(...)

HTH,
Andy
> From: Ajay Shah
> 
> In order to get around the problems of my posting a few minutes ago, I
> thought:
> 
> $ awk -F\| '(NR > 2) {print $2}' cmie_firm_data.text > col2
> $ awk -F\| '(NR > 2) {print $4}' cmie_firm_data.text > col4
> $ paste col2 col4 | head -2
> -510.45 -510.27
> 60700   101900
> $ paste col2 col4 | tail -2
> 28648.12        31617.02
> 491014.77       494308.52
> $ wc -l col2 col4
>   89323 col2
>   89323 col4
>  178646 total
> 
> So all is well.
> 
> But R doesn't like it:
> 
> $ R --vanilla < picture.R 
> 
> R : Copyright 2004, The R Foundation for Statistical Computing
> Version 1.9.1  (2004-06-21), ISBN 3-900051-00-3
> 
> > col2 <- read.table(file="col2")
> > col4 <- read.table(file="col4")
> > print(nrow(col2))
> [1] 89323
> > print(nrow(col4))
> [1] 88746
> 
> Why might I be getting 89,323 and 88,746 obs for two files which `wc'
> believes are each 89,323 lines long?
> 
> I checked, and there is no single quote or C-m in either file.
> 
> -- 
> Ajay Shah                                                   Consultant
> ajayshah at mayin.org                      Department of Economic Affairs
> http://www.mayin.org/ajayshah           Ministry of Finance, New Delhi
>

Liaw, Andy

2004-Jul-06 13:06 UTC

head link

[R] More difficulties in getting data into R

This is what I'd try:

col2and4 <- matrix(scan(pipe("cut -d\| -f2,4 cmie_firm_data.text
|"),
                   skip=2), ncol=2, byrow=TRUE)

Andy
> From: Liaw, Andy
> 
> Could it be that you happen you have `#' in `col4'?  Try 
> either (or both):
> 
> 1. read.table(..., comment.char="")
> 2. scan(...)
> 
> HTH,
> Andy
> 
> > From: Ajay Shah
> > 
> > In order to get around the problems of my posting a few 
> minutes ago, I
> > thought:
> > 
> > $ awk -F\| '(NR > 2) {print $2}' cmie_firm_data.text >
col2
> > $ awk -F\| '(NR > 2) {print $4}' cmie_firm_data.text >
col4
> > $ paste col2 col4 | head -2
> > -510.45 -510.27
> > 60700   101900
> > $ paste col2 col4 | tail -2
> > 28648.12        31617.02
> > 491014.77       494308.52
> > $ wc -l col2 col4
> >   89323 col2
> >   89323 col4
> >  178646 total
> > 
> > So all is well.
> > 
> > But R doesn't like it:
> > 
> > $ R --vanilla < picture.R 
> > 
> > R : Copyright 2004, The R Foundation for Statistical Computing
> > Version 1.9.1  (2004-06-21), ISBN 3-900051-00-3
> > 
> > > col2 <- read.table(file="col2")
> > > col4 <- read.table(file="col4")
> > > print(nrow(col2))
> > [1] 89323
> > > print(nrow(col4))
> > [1] 88746
> > 
> > Why might I be getting 89,323 and 88,746 obs for two files 
> which `wc'
> > believes are each 89,323 lines long?
> > 
> > I checked, and there is no single quote or C-m in either file.
> > 
> > -- 
> > Ajay Shah                                                   
> Consultant
> > ajayshah at mayin.org                      Department of 
> Economic Affairs
> > http://www.mayin.org/ajayshah           Ministry of 
> Finance, New Delhi
> >
> 
> ______________________________________________
> R-help at stat.math.ethz.ch mailing list
> https://www.stat.math.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide! 
> http://www.R-project.org/posting-guide.html
> 
> 
> --------------------------------------------------------------
> ----------------
> Notice:  This e-mail message, together with any attachments, 
> contains information of Merck & Co., Inc. (One Merck Drive, 
> Whitehouse Station, New Jersey, USA 08889), and/or its 
> affiliates (which may be known outside the United States as 
> Merck Frosst, Merck Sharp & Dohme or MSD and in Japan, as 
> Banyu) that may be confidential, proprietary copyrighted 
> and/or legally privileged. It is intended solely for the use 
> of the individual or entity named on this message.  If you 
> are not the intended recipient, and have received this 
> message in error, please notify us immediately by reply 
> e-mail and then delete it from your system.
> --------------------------------------------------------------
> ----------------
>

Seemingly Similar Threads

Search for more apparently analagous threads

R help - Jul 2004 - More difficulties in getting data into R

[R] More difficulties in getting data into R

[R] More difficulties in getting data into R

[R] More difficulties in getting data into R

Seemingly Similar Threads