thr3ads.net - R help - [R] read.csv interpreting numbers as factors [Dec 2013]

If this information is useful, please help other people find it:
Share via:

Bill

2013-Dec-10 09:19 UTC

[R] read.csv interpreting numbers as factors

Why does R interpret a column of numbers in a csv file as a factor when
using read.csv() and how can I prevent that. The data looks like

9928
3502
146
404
1831
686
249

I tried kick=read.csv("kick.csv",stringsAsFactors =FALSE)
as well as
kick=read.csv("kick.csv")

Thanks


On Mon, Dec 2, 2013 at 5:16 PM, William Dunlap <wdunlap@tibco.com> wrote:
> > It seems so inefficient.
>
> But ifelse knows nothing about the expressions given
> as its second and third arguments -- it only sees their
> values after they are evaluated.  Even if it could see the
> expressions, it would not be able to assume that f(x[i])
> is the same as f(x)[i] or things like
>    ifelse(x>0, cumsum(x), cumsum(-x))
> would not work.
>
> You can avoid the computing all of f(x) and then extracting
> a few elements from it by doing something like
>    x <- c("Wednesday", "Monday",
"Wednesday")
>    z1 <- character(length(x))
>    z1[x=="Monday"] <- "Mon"
>    z1[x=="Tuesday"] <- "Tue"
>    z1[x=="Wednesday"] <- "Wed"
> or
>    LongDayNames <-
c("Monday","Tuesday","Wednesday")
>    ShortDayNames <- c("Mon", "Tue", "Wed")
>    z2 <- character(length(x))
>    for(i in seq_along(LongDayNames)) {
>       z2[x==LongDayNames[i]] <- ShortDayNames[i]
>    }
>
> To avoid the repeated x==value[i] you can use match(x, values).
>    z3 <- ShortDayNames[match(x, LongDayNames)]
>
> z1, z2, and z3 are identical  character vectors.
>
> Or, you can use factors.
>    > factor(x, levels=LongDayNames, labels=ShortDayNames)
>    [1] Wed Mon Wed
>    Levels: Mon Tue Wed
>
> Bill Dunlap
> Spotfire, TIBCO Software
> wdunlap tibco.com
>
>
> > -----Original Message-----
> > From: r-help-bounces@r-project.org
[mailto:r-help-bounces@r-project.org]
> On Behalf
> > Of Bill
> > Sent: Monday, December 02, 2013 4:50 PM
> > To: Duncan Murdoch
> > Cc: r-help@r-project.org
> > Subject: Re: [R] ifelse -does it "manage the indexing"?
> >
> > It seems so inefficient. I mean the whole first vector will be
evaluated.
> > Then if the second if is run the whole vector will be evaluated again.
> Then
> > if the next if is run the whole vector will be evaluted again. And so
on.
> > And this could be only to test the first element (if it is false for
each
> > if statement). Then this would be repeated again and again. Is that
> really
> > the way it works? Or am I not thinking clearly?
> >
> >
> > On Mon, Dec 2, 2013 at 4:48 PM, Duncan Murdoch
> > <murdoch.duncan@gmail.com>wrote:
> >
> > > On 13-12-02 7:33 PM, Bill wrote:
> > >
> > >> ifelse ((day_of_week == "Monday"),1,
> > >>    ifelse ((day_of_week == "Tuesday"),2,
> > >>    ifelse ((day_of_week == "Wednesday"),3,
> > >>    ifelse ((day_of_week == "Thursday"),4,
> > >>    ifelse ((day_of_week == "Friday"),5,
> > >>    ifelse ((day_of_week == "Saturday"),6,7)))))))
> > >>
> > >>
> > >>    In code like the above, day_of_week is a vector and so
day_of_week
> => > >> "Monday" will result in a boolean vector.
Suppose day_of_week is
> Monday,
> > >> Thursday, Friday, Tuesday. So day_of_week ==
"Monday" will be
> > >> True,False,False,False. I think that ifelse will test the
first
> element
> > >> and
> > >> it will generate a 1. At this point it will not have run
day_of_week
> => > >> "Tuesday" yet. Then it will test the second
element of day_of_week
> and it
> > >> will be false and this will cause it to evaluate day_of_week
=> "Tuesday".
> > >> My question would be, does the evaluation of day_of_week ==
"Tuesday"
> > >> result in the generation of an entire boolean vector (which
would be
> in
> > >> this case False,False,False,True) or does the ifelse
"manage the
> indexing"
> > >> so that it only tests the second element of the original
vector
> (which is
> > >> Thursday) and for that matter does it therefore not even
bother to
> > >> generate
> > >> the first boolean vector I mentioned above
(True,False,False,False)
> but
> > >> rather just checks the first element?
> > >>    Not sure if I have explained this well but if you
understand I
> would
> > >> appreciate a reply.
> > >>
> > >
> > > See the help for the function.  If any element of the test is
true, the
> > > full first vector will be evaluated.  If any element is false,
the
> second
> > > one will be evaluated.  There are no shortcuts of the kind you
> describe.
> > >
> > > Duncan Murdoch
> > >
> > >
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>
	[[alternative HTML version deleted]]

Jeff Newmiller

2013-Dec-10 10:06 UTC

head link

[R] read.csv interpreting numbers as factors

It is bad netiquette to hijack an existing thread for a new topic. Please start
a new email thread when changing topics.

If your data really consists of what you show, then read.csv won't behave
that way. I suggest that you open the file in a text editor and look for odd
characters. They may be invisible.

Going out on a limb, you may be trying to read a tab separated file, and if so
then you need to use the sep=?\t" argument to read.csv.
---------------------------------------------------------------------------
Jeff Newmiller                        The     .....       .....  Go Live...
DCN:<jdnewmil at dcn.davis.ca.us>        Basics: ##.#.       ##.#.  Live
Go...
                                      Live:   OO#.. Dead: OO#..  Playing
Research Engineer (Solar/Batteries            O.O#.       #.O#.  with
/Software/Embedded Controllers)               .OO#.       .OO#.  rocks...1k
--------------------------------------------------------------------------- 
Sent from my phone. Please excuse my brevity.

Bill <william108 at gmail.com> wrote:>Why does R interpret a column of numbers in a csv file as a factor when
>using read.csv() and how can I prevent that. The data looks like
>
>9928
>3502
>146
>404
>1831
>686
>249
>
>I tried kick=read.csv("kick.csv",stringsAsFactors =FALSE)
>as well as
>kick=read.csv("kick.csv")
>
>Thanks
>
>
>On Mon, Dec 2, 2013 at 5:16 PM, William Dunlap <wdunlap at tibco.com>
>wrote:
>
>> > It seems so inefficient.
>>
>> But ifelse knows nothing about the expressions given
>> as its second and third arguments -- it only sees their
>> values after they are evaluated.  Even if it could see the
>> expressions, it would not be able to assume that f(x[i])
>> is the same as f(x)[i] or things like
>>    ifelse(x>0, cumsum(x), cumsum(-x))
>> would not work.
>>
>> You can avoid the computing all of f(x) and then extracting
>> a few elements from it by doing something like
>>    x <- c("Wednesday", "Monday",
"Wednesday")
>>    z1 <- character(length(x))
>>    z1[x=="Monday"] <- "Mon"
>>    z1[x=="Tuesday"] <- "Tue"
>>    z1[x=="Wednesday"] <- "Wed"
>> or
>>    LongDayNames <-
c("Monday","Tuesday","Wednesday")
>>    ShortDayNames <- c("Mon", "Tue",
"Wed")
>>    z2 <- character(length(x))
>>    for(i in seq_along(LongDayNames)) {
>>       z2[x==LongDayNames[i]] <- ShortDayNames[i]
>>    }
>>
>> To avoid the repeated x==value[i] you can use match(x, values).
>>    z3 <- ShortDayNames[match(x, LongDayNames)]
>>
>> z1, z2, and z3 are identical  character vectors.
>>
>> Or, you can use factors.
>>    > factor(x, levels=LongDayNames, labels=ShortDayNames)
>>    [1] Wed Mon Wed
>>    Levels: Mon Tue Wed
>>
>> Bill Dunlap
>> Spotfire, TIBCO Software
>> wdunlap tibco.com
>>
>>
>> > -----Original Message-----
>> > From: r-help-bounces at r-project.org
>[mailto:r-help-bounces at r-project.org]
>> On Behalf
>> > Of Bill
>> > Sent: Monday, December 02, 2013 4:50 PM
>> > To: Duncan Murdoch
>> > Cc: r-help at r-project.org
>> > Subject: Re: [R] ifelse -does it "manage the indexing"?
>> >
>> > It seems so inefficient. I mean the whole first vector will be
>evaluated.
>> > Then if the second if is run the whole vector will be evaluated
>again.
>> Then
>> > if the next if is run the whole vector will be evaluted again. And
>so on.
>> > And this could be only to test the first element (if it is false
>for each
>> > if statement). Then this would be repeated again and again. Is
that
>> really
>> > the way it works? Or am I not thinking clearly?
>> >
>> >
>> > On Mon, Dec 2, 2013 at 4:48 PM, Duncan Murdoch
>> > <murdoch.duncan at gmail.com>wrote:
>> >
>> > > On 13-12-02 7:33 PM, Bill wrote:
>> > >
>> > >> ifelse ((day_of_week == "Monday"),1,
>> > >>    ifelse ((day_of_week == "Tuesday"),2,
>> > >>    ifelse ((day_of_week == "Wednesday"),3,
>> > >>    ifelse ((day_of_week == "Thursday"),4,
>> > >>    ifelse ((day_of_week == "Friday"),5,
>> > >>    ifelse ((day_of_week ==
"Saturday"),6,7)))))))
>> > >>
>> > >>
>> > >>    In code like the above, day_of_week is a vector and so
>day_of_week
>> =>> > >> "Monday" will result in a boolean
vector. Suppose day_of_week is
>> Monday,
>> > >> Thursday, Friday, Tuesday. So day_of_week ==
"Monday" will be
>> > >> True,False,False,False. I think that ifelse will test the
first
>> element
>> > >> and
>> > >> it will generate a 1. At this point it will not have run
>day_of_week
>> =>> > >> "Tuesday" yet. Then it will test the
second element of
>day_of_week
>> and it
>> > >> will be false and this will cause it to evaluate
day_of_week =>> "Tuesday".
>> > >> My question would be, does the evaluation of day_of_week
=>"Tuesday"
>> > >> result in the generation of an entire boolean vector
(which
>would be
>> in
>> > >> this case False,False,False,True) or does the ifelse
"manage the
>> indexing"
>> > >> so that it only tests the second element of the original
vector
>> (which is
>> > >> Thursday) and for that matter does it therefore not even
bother
>to
>> > >> generate
>> > >> the first boolean vector I mentioned above
>(True,False,False,False)
>> but
>> > >> rather just checks the first element?
>> > >>    Not sure if I have explained this well but if you
understand
>I
>> would
>> > >> appreciate a reply.
>> > >>
>> > >
>> > > See the help for the function.  If any element of the test is
>true, the
>> > > full first vector will be evaluated.  If any element is
false,
>the
>> second
>> > > one will be evaluated.  There are no shortcuts of the kind
you
>> describe.
>> > >
>> > > Duncan Murdoch
>> > >
>> > >
>> >
>> >       [[alternative HTML version deleted]]
>> >
>> > ______________________________________________
>> > R-help at r-project.org mailing list
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>	[[alternative HTML version deleted]]
>
>______________________________________________
>R-help at r-project.org mailing list
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

Barry Rowlingson

2013-Dec-10 10:48 UTC

head link

[R] read.csv interpreting numbers as factors

On Tue, Dec 10, 2013 at 10:06 AM, Jeff Newmiller
<jdnewmil at dcn.davis.ca.us> wrote:> It is bad netiquette to hijack an existing thread for a new topic. Please
start a new email thread when changing topics.
>
> If your data really consists of what you show, then read.csv won't
behave that way. I suggest that you open the file in a text editor and look for
odd characters. They may be invisible.
>
> Going out on a limb, you may be trying to read a tab separated file, and if
so then you need to use the sep=?\t" argument to read.csv.
Or something in the data isn't a valid number. Try:

as.numeric(as.character(factorthingyouthinkshouldbenumbers))

and if you get any NA values then those things aren't valid number
formats. You need as.numeric(as.character(..)) because otherwise
as.numeric just gets the underlying number codes for the factor
levels.


 >
f=factor(c("1","1","2","three","4","69"))
 > f
[1] 1     1     2     three 4     69
Levels: 1 2 4 69 three

 > as.numeric(f)
[1] 1 1 2 5 3 4

 > as.numeric(as.character(f))
[1]  1  1  2 NA  4 69
Warning message:
NAs introduced by coercion

R help - Dec 2013 - read.csv interpreting numbers as factors

[R] read.csv interpreting numbers as factors

[R] read.csv interpreting numbers as factors

[R] read.csv interpreting numbers as factors