Hello everybody, I have a problem with R. I uploaded a questionnaire saved as csv into R and I tried to test independence between two variables. data <- read.csv("C:/Users/Me/Desktop/data.csv")> View(data)> df read.csv("C:/Users/Me/Desktop/data.csv")> ls() [1] "df" "data"> attributes(data$gender) $levels [1] " F" " M" "F" "M" $class [1] "factor" I changed my variable "gender" into a factor using: data$gender=factor(data$gender, levels=c(1:2), labels= c( "F", "M"), exclude= NA, nmax= NA). Then I wrote data$gender and the only thing i got was: [1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> [21] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> [41] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> [61] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> Levels: F M Does anybody know why? -My csv doc in the column gender is filled out properly. (M=Male, F= Female) -My imported dataset in R is complete (all values) ! I have done this with a different excel document and it worked out without any problems. I am really clueless. I cant go further and compare the variables and do t-tests without this working. Could someone please help me out? Thank you. [[alternative HTML version deleted]]
Well, you can help yourself on this list if you stop letting your email client determine the format (HTML in this case) that you use since that format gets corrupted on this mailing list leading to frequent misunderstandings. Learn how to make your email client send plain text format. If you go back to your first line and look at str(data), you will see that read.csv automatically converted the gender column to a factor for you. In your later attempt to convert it you thought it would draw on the underlying integer values when it "acts" like character data so none of the specified levels ("1" or "2") were found in it. If you want to control the levels used in the factor (as I usually prefer to do) then use either the as.is=TRUE or stringsAsFactors=FALSE parameter to the read.csv function to make sure no factors are automatically created. Then specify character values for your levels instead of second-guessing R. Note that there is a bit of an art to reading the help files, as in: ?read.csv that you should start to practice. When you do read that help file, you will find that there are a lot of parameters to the "read.table" function, and rather fewer specified for the read.csv definition. The reason is that the read.csv function simply calls the read.table function with certain parameters forced to specific values. You can set any of the other parameters that read.table expects in your call to read.csv and they will be passed on to read.table. Oh, and one other thing: functions are quite similar to data objects in R, and there is a function called "data" that comes with R. While defining your own object called "data" works in this case, it is good practice to learn to not re-use object names like that since it can make reading your code confusing at the very least. On Sat, 11 Jul 2015, Dagmar Jurankov? wrote:> Hello everybody, I have a problem with R. > > > I uploaded a questionnaire saved as csv into R and I tried to test > independence between two variables. > > > > data <- read.csv("C:/Users/Me/Desktop/data.csv")> View(data)> df > read.csv("C:/Users/Me/Desktop/data.csv")> ls() > [1] "df" "data"> attributes(data$gender) > $levels > [1] " F" " M" "F" "M" > > $class > [1] "factor" > > > I changed my variable "gender" into a factor using: > > > data$gender=factor(data$gender, levels=c(1:2), labels= c( "F", "M"), > exclude= NA, nmax= NA). > > > Then I wrote data$gender and the only thing i got was: > > > [1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> > > [21] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> > > [41] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> > > [61] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > > Levels: F M > > > Does anybody know why? > > > -My csv doc in the column gender is filled out properly. (M=Male, F= Female) > > -My imported dataset in R is complete (all values) > > > ! I have done this with a different excel document and it worked out > without any problems. I am really clueless. I cant go further and compare > the variables and do t-tests without this working. > > > Could someone please help me out? > > Thank you. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >--------------------------------------------------------------------------- Jeff Newmiller The ..... ..... Go Live... DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... Live: OO#.. Dead: OO#.. Playing Research Engineer (Solar/Batteries O.O#. #.O#. with /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
Hello, Just to add that the op's data$gender has 4 levels, not just 2. So it would be better to remove the leading spaces from " F" and " M", by using ?sub or ?gsub. Hope this helps, Rui Barradas Em 11-07-2015 21:19, Jeff Newmiller escreveu:> Well, you can help yourself on this list if you stop letting your email > client determine the format (HTML in this case) that you use since that > format gets corrupted on this mailing list leading to frequent > misunderstandings. Learn how to make your email client send plain text > format. > > If you go back to your first line and look at str(data), you will see > that read.csv automatically converted the gender column to a factor for > you. In your later attempt to convert it you thought it would draw on > the underlying integer values when it "acts" like character data so none > of the specified levels ("1" or "2") were found in it. > > If you want to control the levels used in the factor (as I usually > prefer to do) then use either the as.is=TRUE or stringsAsFactors=FALSE > parameter to the read.csv function to make sure no factors are > automatically created. Then specify character values for your levels > instead of second-guessing R. > > Note that there is a bit of an art to reading the help files, as in: > > ?read.csv > > that you should start to practice. When you do read that help file, you > will find that there are a lot of parameters to the "read.table" > function, and rather fewer specified for the read.csv definition. The > reason is that the read.csv function simply calls the read.table > function with certain parameters forced to specific values. You can set > any of the other parameters that read.table expects in your call to > read.csv and they will be passed on to read.table. > > Oh, and one other thing: functions are quite similar to data objects in > R, and there is a function called "data" that comes with R. While > defining your own object called "data" works in this case, it is good > practice to learn to not re-use object names like that since it can make > reading your code confusing at the very least. > > On Sat, 11 Jul 2015, Dagmar Jurankov? wrote: > >> Hello everybody, I have a problem with R. >> >> >> I uploaded a questionnaire saved as csv into R and I tried to test >> independence between two variables. >> >> >> >> data <- read.csv("C:/Users/Me/Desktop/data.csv")> View(data)> df >> read.csv("C:/Users/Me/Desktop/data.csv")> ls() >> [1] "df" "data"> attributes(data$gender) >> $levels >> [1] " F" " M" "F" "M" >> >> $class >> [1] "factor" >> >> >> I changed my variable "gender" into a factor using: >> >> >> data$gender=factor(data$gender, levels=c(1:2), labels= c( "F", "M"), >> exclude= NA, nmax= NA). >> >> >> Then I wrote data$gender and the only thing i got was: >> >> >> [1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> >> <NA> <NA> <NA> <NA> <NA> <NA> >> >> [21] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> >> <NA> >> <NA> <NA> <NA> <NA> <NA> <NA> >> >> [41] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> >> <NA> >> <NA> <NA> <NA> <NA> <NA> <NA> >> >> [61] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> >> >> Levels: F M >> >> >> Does anybody know why? >> >> >> -My csv doc in the column gender is filled out properly. (M=Male, F>> Female) >> >> -My imported dataset in R is complete (all values) >> >> >> ! I have done this with a different excel document and it worked out >> without any problems. I am really clueless. I cant go further and compare >> the variables and do t-tests without this working. >> >> >> Could someone please help me out? >> >> Thank you. >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnewmil at dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
There are two issues here. First, your original factor seems to have 4 levels: " F", " M", "F", "M". Note the extra space in front of the first two F and M. You may want to fix that first: gender.fixed = sub(" ", "", as.character(data$gender)) Check that everything is correct by typing table(gender.fixed) or table(data$gender, gender.fixed) Then you can convert the fixed gender back to a factor, but pay attention to the levels: data$gender = factor(gender.fixed, levels = c("F", "M")) Hopefully this works, Peter On Sat, Jul 11, 2015 at 12:21 PM, Dagmar Jurankov? <dagmar.juranka at gmail.com> wrote:> Hello everybody, I have a problem with R. > > > I uploaded a questionnaire saved as csv into R and I tried to test > independence between two variables. > > > > data <- read.csv("C:/Users/Me/Desktop/data.csv")> View(data)> df > read.csv("C:/Users/Me/Desktop/data.csv")> ls() > [1] "df" "data"> attributes(data$gender) > $levels > [1] " F" " M" "F" "M" > > $class > [1] "factor" > > > I changed my variable "gender" into a factor using: > > > data$gender=factor(data$gender, levels=c(1:2), labels= c( "F", "M"), > exclude= NA, nmax= NA). > > > Then I wrote data$gender and the only thing i got was: > > > [1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> > > [21] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> > > [41] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> > > [61] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > > Levels: F M > > > Does anybody know why? > > > -My csv doc in the column gender is filled out properly. (M=Male, F= Female) > > -My imported dataset in R is complete (all values) > > > ! I have done this with a different excel document and it worked out > without any problems. I am really clueless. I cant go further and compare > the variables and do t-tests without this working. > > > Could someone please help me out? > > Thank you. > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Try: ggg <- c("F","M","F",M") data$gender <- factor(ggg[data$gender]) This in effect converts the (spurious) " F" and " M" levels into "F" and "M" respectively, giving you a factor with the two levels that you really want. cheers, Rolf Turner P. S. *Not* a good idea to use "data" as the name of your data frame. See fortune("dog"). R. T. -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 On 12/07/15 07:21, Dagmar Jurankov? wrote:> Hello everybody, I have a problem with R. > > > I uploaded a questionnaire saved as csv into R and I tried to test > independence between two variables. > > > > data <- read.csv("C:/Users/Me/Desktop/data.csv")> View(data)> df > read.csv("C:/Users/Me/Desktop/data.csv")> ls() > [1] "df" "data"> attributes(data$gender) > $levels > [1] " F" " M" "F" "M" > > $class > [1] "factor" > > > I changed my variable "gender" into a factor using: > > > data$gender=factor(data$gender, levels=c(1:2), labels= c( "F", "M"), > exclude= NA, nmax= NA). > > > Then I wrote data$gender and the only thing i got was: > > > [1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> > > [21] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> > > [41] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> > > [61] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > > Levels: F M > > > Does anybody know why? > > > -My csv doc in the column gender is filled out properly. (M=Male, F= Female) > > -My imported dataset in R is complete (all values) > > > ! I have done this with a different excel document and it worked out > without any problems. I am really clueless. I cant go further and compare > the variables and do t-tests without this working. > > > Could someone please help me out? > > Thank you.
(1) Please keep the discourse on list. (2) Moral of your story: Don't use Excel --- for *anything*!!! (3) Why didn't you follow my suggestion? (4) Naturally you get NAs! There are no levels of "1" or "2" in your data. The levels are "F" and "M", for crying out loud!!! Why *on earth* did you say "levels=c(1:2)"? This could never possibly make any sense at all. (5) And by the way, why on earth do you write "c(1:2)" rather than just "1:2"? What do you think the "c()" is doing for you? Understand what things *mean*; don't just slap code down and hope. cheers, Rolf Turner -- Technical Editor ANZJS Department of Statistics University of Auckland Phone: +64-9-373-7599 ext. 88276 On 13/07/15 00:25, Dagmar Jurankov? wrote:> Hello. > > There was a gap in front of F and M in my Excel table, thats why there > were more levels of F and M. I corrected it (and saved as csv) and it > still shows NA NA NA. > > > selbst <- read.csv("C:/Users/Dadka/Desktop/Rcsv/doc.ex.csv/selbst.csv") > > View(selbst) > df> read.csv("C:/Users/Dadka/Desktop/Rcsv/doc.ex.csv/selbst.csv") > selbst$q_2 [1] F F M F F M F M F M M F F M M F M F F F F F F M M F M F F F M F F F F F F F F F F M F F M F M F F F F > [52] F F F M M M M F M F F F F F F M F > Levels: F M >>attributes(selbst$q_2) $levels > [1] "F" "M" > > $class > [1] "factor" > >>selbst$q_2= factor(selbst$q_2, levels=c(1:2), > labels=c("F","M"),exclude=NA, nmax=NA) > selbst$q_2 [1] <NA> <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> <NA> [21] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> [41] <NA> <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> <NA> [61] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> Levels: F M > > > I think the problem might be in my csv document. > Screenshot in png format is attached below. > Could you maybe have a look at it please? > Thank you. > > 2015-07-12 1:00 GMT+02:00 Rolf Turner <r.turner at auckland.ac.nz > <mailto:r.turner at auckland.ac.nz>>: > > > Try: > > ggg <- c("F","M","F",M") > data$gender <- factor(ggg[data$gender]) > > This in effect converts the (spurious) " F" and " M" levels into "F" > and "M" respectively, giving you a factor with the two levels that > you really want. > > cheers, > > Rolf Turner > > P. S. *Not* a good idea to use "data" as the name of your data frame. > See fortune("dog"). > > R. T. > > -- > Technical Editor ANZJS > Department of Statistics > University of Auckland > Phone: +64-9-373-7599 ext. 88276 <tel:%2B64-9-373-7599%20ext.%2088276> > > > On 12/07/15 07:21, Dagmar Jurankov? wrote: > > Hello everybody, I have a problem with R. > > > I uploaded a questionnaire saved as csv into R and I tried to test > independence between two variables. > > > > data <- read.csv("C:/Users/Me/Desktop/data.csv")> View(data)> df > read.csv("C:/Users/Me/Desktop/data.csv")> ls() > [1] "df" "data"> attributes(data$gender) > $levels > [1] " F" " M" "F" "M" > > $class > [1] "factor" > > > I changed my variable "gender" into a factor using: > > > data$gender=factor(data$gender, levels=c(1:2), labels= c( "F", "M"), > exclude= NA, nmax= NA). > > > Then I wrote data$gender and the only thing i got was: > > > [1] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> > > [21] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> > > [41] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > <NA> <NA> > <NA> <NA> <NA> <NA> <NA> <NA> > > [61] <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> > > Levels: F M > > > Does anybody know why? > > > -My csv doc in the column gender is filled out properly. > (M=Male, F= Female) > > -My imported dataset in R is complete (all values) > > > ! I have done this with a different excel document and it worked out > without any problems. I am really clueless. I cant go further > and compare > the variables and do t-tests without this working. > > > Could someone please help me out? > > Thank you.