thr3ads.net - R help - [R] problems with function read.table [Sep 2011]

If this information is useful, please help other people find it:
Share via:

Samir Benzerfa

2011-Sep-08 14:23 UTC

[R] problems with function read.table

Hello everyone

 

I have a couple of questions about the usage of the R function
"read.table(.)". My point of departure is that I want to import a
matrix
(consisting of time and daily stock returns of many stocks) in R. Most of
the data is numeric, however some values are missing (blanks) and in other
cases I have the character "#DIV/0!" (from excel). My goal is to do
some
regression analysis with this matrix. My questions now are the following
ones:

 

1.       How can I in general tell R to automatically replace some specific
numbers or characters in tables by others? (for example to replace all
characters "#DIV/0!" by the number 0 or simply "NA")

2.       How can I tell R to fill blanks with a number 0 or "NA"?

3.       How can I tell R to omit the "NA" fields in the calculations
but
not the whole row or column? (I realized that the function "na.omit"
omits
the whole row)

 

Many thanks for your help!

 

Sincerely,

Samir


	[[alternative HTML version deleted]]

Carlos Ortega

2011-Sep-08 15:55 UTC

head link

[R] problems with function read.table

Hi,

If you read carefully the help pages for read.table you get this:


na.stringsa character vector of strings which are to be interpreted as
NA<../../utils/help/NA> values.
Blank fields are also considered to be missing values in logical, integer,
numeric and complex fields.

So, both NAs and blank fields are considered as NAs directly by read.table.

Once you have imported your data, you can modify with any of the string
manipulation functions (sub() or gsub()) to change your "#DIV/0!" to
the
string "NAs". Another option is to manipulate your Excel file and
consider
the division by cero with a "IF" and get back a NA if that happens.

And finally, instead of using na.omits use option na.rm=T to get done your
calculations:
> mean(c(12,23,24,45,67,NA), na.rm=T)[1] 34.2


Regards,
Carlos Ortega
www.qualityexcellence.es

On Thu, Sep 8, 2011 at 4:23 PM, Samir Benzerfa <benzerfa@gmx.ch> wrote:
> Hello everyone
>
>
>
> I have a couple of questions about the usage of the R function
> "read.table(.)". My point of departure is that I want to import a
matrix
> (consisting of time and daily stock returns of many stocks) in R. Most of
> the data is numeric, however some values are missing (blanks) and in other
> cases I have the character "#DIV/0!" (from excel). My goal is to
do some
> regression analysis with this matrix. My questions now are the following
> ones:
>
>
>
> 1.       How can I in general tell R to automatically replace some specific
> numbers or characters in tables by others? (for example to replace all
> characters "#DIV/0!" by the number 0 or simply "NA")
>
> 2.       How can I tell R to fill blanks with a number 0 or "NA"?
>
> 3.       How can I tell R to omit the "NA" fields in the
calculations but
> not the whole row or column? (I realized that the function
"na.omit" omits
> the whole row)
>
>
>
> Many thanks for your help!
>
>
>
> Sincerely,
>
> Samir
>
>
>        [[alternative HTML version deleted]]
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Petr PIKAL

2011-Sep-09 12:29 UTC

head link

[R] problems with function read.table

Hi
> Hello,
> 
> in the meanwhile i found the problem for the #. The problem was, that # 
and> ?none? are comments by default, so I turned comments of with 
comment.char=??> as stated in the help.
Hm. You are not telling the whole story.
I made a sample excel sheet with blank values and #DIV/0! result (simple 
two columns with missing values and division by these missing values in 
first column)
> x<-read.delim("clipboard")
> x            r a  b
1        1.25 5  4
2 1.166666667 7  6
3 0.142857143 1  7
4     #DIV/0! 5 NA
5         4.5 9  2
6           2 6  3
7     #DIV/0! 4 NA

simple read.delim gave me correctly NA in place of blank values and 
#DIV/0! in first column, which therefore became factor.
> y<-read.delim("clipboard", na.strings=c("NA",
"#DIV/0!"))
> y          r a  b
1 1.2500000 5  4
2 1.1666667 7  6
3 0.1428571 1  7
4        NA 5 NA
5 4.5000000 9  2
6 2.0000000 6  3
7        NA 4 NA

When I defined vector of na.strings I correctly got NA values for both 
blank and #DIV/0! in Excel sheet. So I wonder where you get problems with 
blank values 
> 
> For the problem with blanks I still didn?t find the solution, even it 
seems> to be a frequent problem. I got the solution with fill=T, but this fills 
my> row at the end with a NA. I want to have the NA exactly in the field 
where> there is a blank in the .txt file.
It seems to me that you do not use correct delimiter. For Excel it is tab 
- "\t". For csv it can be either ";" or ","
depending on your locale. You
need to show us some small part of a file, preferably together with your 
read.* command and the result you got.

Regards
Petr
> 
> Date                      rtn               vwretd                ewretd
> sprtrn
> 19700102             0.000686              0.00547 0.033450
> 0.010211
> 19700105             0.009596 0.018947
> 0.004946
> 19700106             #DIV0!                -0.007233
> -0.006848
> 19700107             0.000678              -0.001272
> 0.003559              -0.002047
> 19700108             0.002034              0.000564                0.11
> 0.000540
> 19700109                                  -0.002797               0
> -0.003021
> 19700113             0.017335              0.000737 -0.001090
> 0
> 
> Can you provide a solution?
> 
> Thanks,
> Samir
> 
> p.s. I know that my questions seem obvious to you, I?m sorry for that, 
but I> just started to work with R ;)
> 
> -----Urspr?ngliche Nachricht-----
> Von: Petr PIKAL [mailto:petr.pikal at precheza.cz] 
> Gesendet: Freitag, 9. September 2011 09:23
> An: Carlos Ortega
> Cc: Samir Benzerfa; r-help at r-project.org
> Betreff: Re: [R] problems with function read.table
> 
> Hi
> 
> 
> > 
> > Hi,
> > 
> > If you read carefully the help pages for read.table you get this:
> > 
> > 
> > na.stringsa character vector of strings which are to be interpreted as
> > NA<../../utils/help/NA> values.
> > Blank fields are also considered to be missing values in logical, 
> integer,
> > numeric and complex fields.
> > 
> > So, both NAs and blank fields are considered as NAs directly by 
> read.table.
> > 
> > Once you have imported your data, you can modify with any of the 
string> > manipulation functions (sub() or gsub()) to change your
"#DIV/0!" to
the> > string "NAs". Another option is to manipulate your Excel
file and
> consider
> > the division by cero with a "IF" and get back a NA if that
happens.
> 
> The only problem is that in such case all columns which has
"#DIV/0!"
are > converted to factors and you need to consider changing it back to 
numeric.> 
> read.* functions accept as na.string definition not only one value but 
> also vector of values and you can get rid of all non numeric and other 
> weird Excel values by defining it as a na.strings in read.table call.
> 
> > x <- read.delim("clipboard")
> 
> > str(x)
> 'data.frame':   6 obs. of  3 variables:
>  $ a: int  1 5 9 8 6 3
>  $ b: int  3 5 7 0 NA 6
>  $ r: Factor w/ 5 levels "#DIV/0!","0.333333333",..: 2
4 5 1 1 3
> 
> > y<-read.delim("clipboard", na.strings=c("NA",
"#DIV/0!"))
> > str(y)
> 'data.frame':   6 obs. of  3 variables:
>  $ a: int  1 5 9 8 6 3
>  $ b: int  3 5 7 0 NA 6
>  $ r: num  0.333 1 1.286 NA NA ...
> >
> 
> Regards
> Petr
> 
> 
> > 
> > And finally, instead of using na.omits use option na.rm=T to get done 
> your
> > calculations:
> > 
> > > mean(c(12,23,24,45,67,NA), na.rm=T)[1] 34.2
> > 
> > 
> > 
> > Regards,
> > Carlos Ortega
> > www.qualityexcellence.es
> > 
> > On Thu, Sep 8, 2011 at 4:23 PM, Samir Benzerfa <benzerfa at
gmx.ch>
wrote:> > 
> > > Hello everyone
> > >
> > >
> > >
> > > I have a couple of questions about the usage of the R function
> > > "read.table(.)". My point of departure is that I want
to import a
> matrix
> > > (consisting of time and daily stock returns of many stocks) in R.
Most > of
> > > the data is numeric, however some values are missing (blanks) and
in
> other
> > > cases I have the character "#DIV/0!" (from excel). My
goal is to do
> some
> > > regression analysis with this matrix. My questions now are the 
> following
> > > ones:
> > >
> > >
> > >
> > > 1.       How can I in general tell R to automatically replace
some
> specific
> > > numbers or characters in tables by others? (for example to
replace
all> > > characters "#DIV/0!" by the number 0 or simply
"NA")
> > >
> > > 2.       How can I tell R to fill blanks with a number 0 or
"NA"?
> > >
> > > 3.       How can I tell R to omit the "NA" fields in
the
calculations > but
> > > not the whole row or column? (I realized that the function
"na.omit"
> omits
> > > the whole row)
> > >
> > >
> > >
> > > Many thanks for your help!
> > >
> > >
> > >
> > > Sincerely,
> > >
> > > Samir
> > >
> > >
> > >        [[alternative HTML version deleted]]
> > >
> > > ______________________________________________
> > > R-help at r-project.org mailing list
> > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > PLEASE do read the posting guide
> > > http://www.R-project.org/posting-guide.html
> > > and provide commented, minimal, self-contained, reproducible
code.
> > >
> > 
> >    [[alternative HTML version deleted]]
> > 
> > ______________________________________________
> > R-help at r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

Seemingly Similar Threads

Search for more maybe matching threads

R help - Sep 2011 - problems with function read.table

[R] problems with function read.table

[R] problems with function read.table

[R] problems with function read.table

Seemingly Similar Threads