Hello (and Happy New Year), When I create a factor with labels in the order I want, write the data as a text file, and then retrieve them, the factor levels are no longer in the proper order. Here is what I do (I tried many variations): # educ is a numeric vector with 1,001 observations. # There is one NA # Use educ to create a factor feducord <- factor(educ, labels = c('Elem', 'Mid', 'HS', + 'Bus', 'Some', 'Col', 'Post'), ordered = T) levels(feducord) [1] "Elem" "Mid" "HS" "Bus" "Some" "Col" "Post" table(feducord) feducord Elem Mid HS Bus Some Col Post 30 90 303 108 236 144 89 # The above is what I want. The frequencies agree with # the codebook # Make a data frame and save it. (I want a text file.) testdf <- data.frame(feducord) str(testdf) 'data.frame': 1001 obs. of 1 variable: $ feducord: Ord.factor w/ 7 levels "Elem"<"Mid"<"HS"<..: 5 6 5 7 3 4 3 3 3 5 ... write.table(testdf, file = 'Junkarea/test.txt') # So far, so good. rm(testdf, feducord) # Go away. # Come back later to retrieve the data. testdf <- read.table(file = 'Junkarea/test.txt') # But levels are no longer ordered str(testdf) 'data.frame': 1001 obs. of 1 variable: $ feducord: Factor w/ 7 levels "Bus","Col","Elem",..: 7 2 7 6 4 1 4 4 4 7 table(testdf$feducord) Bus Col Elem HS Mid Post Some 108 144 30 303 90 89 236 # The frequencies are correct, but the ordering is wrong. Clearly I am missing something obvious, but I can't see it. If I save "feducord" and load it, the order of the levels is as it should be. But I don't know why writing to a test file should change anything. Any help would be greatly appreciated. (You're right, I don't have anything better to do on New Year's eve.)
G'day H.T. On Sat, 1 Jan 2011 00:41:10 -0500 (EST) "H. T. Reynolds" <htr at udel.edu> wrote:> When I create a factor with labels in the order I want, write the > data as a text file, and then retrieve them, the factor levels are no > longer in the proper order.Not surprisingly. :) [..big snip..]> testdf <- read.table(file = 'Junkarea/test.txt')Did you look at the file Junkarea/test.txt, e.g. with a text editor? You will see that read.table() stores only the observed values but no information about what the mode of each variable in the data frame is. In particular, it doesn't store that a variable is a factor and definitely not the levels and their ordering. Actually, write.table() saves a factor by writing out the observed labels as character stings. Only because read.table() by default turns character data into factors (a behaviour that some useRs don't like and why the option stringsAsFactors exists) you end up with a factor again.> # But levels are no longer orderedBut the help file of factor (see ?factor) states in the warning section: The levels of a factor are by default sorted, but the sort order may well depend on the locale at the time of creation, and should not be assumed to be ASCII. Thus, arguably, the levels are ordered, just not the order you want. :)> Clearly I am missing something obvious, but I can't see it. If I save > "feducord" and load it, the order of the levels is as it should be. > But I don't know why writing to a test file should change anything. > Any help would be greatly appreciated.If you save() and load() an object, then it is saved in binary format, thus much more information about it can be stored. Indeed, I would expect that a faithful internal representation of the object is stored in the binary format so that a save(), rm() and load() would restore exactly the same object. Saving objects in text formats is prone to loss of information. E.g., as you experience with write.table() and read.table() no information about ordering of levels is stored using these functions. If care is not taken when writing numbers to text files, the internal representation can change, e.g. : R> x <- data.frame(x=seq(from=0, to=1, length=11)) R> write.table(x, file="/tmp/junk1") R> y <- read.table("/tmp/junk1") R> identical(x,y) [1] FALSE R> x-y x 1 0.000000e+00 2 0.000000e+00 3 0.000000e+00 4 5.551115e-17 5 0.000000e+00 6 0.000000e+00 7 1.110223e-16 8 1.110223e-16 9 0.000000e+00 10 0.000000e+00 11 0.000000e+00 Having said all this, if you want to save your data in a text file with a representation that remembers the ordering of the factor levels, look at dput(): R> fac <- gl(2,4, labels=c("White", "Black")) R> fac [1] White White White White Black Black Black Black Levels: White Black R> write.table(fac, file="/tmp/junk") R> str(read.table("/tmp/junk")) 'data.frame': 8 obs. of 1 variable: $ x: Factor w/ 2 levels "Black","White": 2 2 2 2 1 1 1 1 R> dput(fac, file="/tmp/junk") R> str(dget(file="/tmp/junk")) Factor w/ 2 levels "White","Black": 1 1 1 1 2 2 2 2 It might just be that the text representation used by dput() is not particularly digestible for the human eye. :)> (You're right, I don't have anything better to do on New Year's eve.)New Year's eve? The first day of the new year is already nearly over! :) HTH. Cheers, Berwin ========================== Full address ===========================Berwin A Turlach Tel.: +61 (8) 6488 3338 (secr) School of Maths and Stats (M019) +61 (8) 6488 3383 (self) The University of Western Australia FAX : +61 (8) 6488 1028 35 Stirling Highway Crawley WA 6009 e-mail: berwin at maths.uwa.edu.au Australia http://www.maths.uwa.edu.au/~berwin
On Jan 1, 2011, at 12:41 AM, H. T. Reynolds wrote:> Hello (and Happy New Year), > > When I create a factor with labels in the order I want, write the > data as a text file,Why? What is the reason for this process.> and then retrieve them, the factor levels are no longer in the > proper order.Two further ideas to those offered by Turlach and Bolker: You can name them with leading digits that ascend in the desired order or I have seen described ( but not found a fully worked example despite what I thought was an adequate search) the use of an as() method which in this instance might apply as.factor with your own level specification while reading with colClasses.> > Here is what I do (I tried many variations): > > # educ is a numeric vector with 1,001 observations. > # There is one NA > > # Use educ to create a factor > > feducord <- factor(educ, labels = c('Elem', 'Mid', 'HS', > + 'Bus', 'Some', 'Col', 'Post'), ordered = T) > > levels(feducord) > [1] "Elem" "Mid" "HS" "Bus" "Some" "Col" "Post" > > table(feducord) > feducord > Elem Mid HS Bus Some Col Post > 30 90 303 108 236 144 89 > > # The above is what I want. The frequencies agree with > # the codebook > > # Make a data frame and save it. (I want a text file.) > > testdf <- data.frame(feducord) > str(testdf) > 'data.frame': 1001 obs. of 1 variable: > $ feducord: Ord.factor w/ 7 levels "Elem"<"Mid"<"HS"<..: > 5 6 5 7 3 4 3 3 3 5 ... > write.table(testdf, file = 'Junkarea/test.txt') > > # So far, so good. > > rm(testdf, feducord) > > # Go away. > # Come back later to retrieve the data. > > testdf <- read.table(file = 'Junkarea/test.txt') > > # But levels are no longer ordered > > str(testdf) > 'data.frame': 1001 obs. of 1 variable: > $ feducord: Factor w/ 7 levels "Bus","Col","Elem",..: > 7 2 7 6 4 1 4 4 4 7 > > table(testdf$feducord) > Bus Col Elem HS Mid Post Some > 108 144 30 303 90 89 236 > > # The frequencies are correct, but the ordering is wrong. > > Clearly I am missing something obvious, but I can't see it. If I > save "feducord" and load it, the order of the levels is as it should > be. But I don't know why writing to a test file should change > anything. Any help would be greatly appreciated. > > (You're right, I don't have anything better to do on New Year's eve.) > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.David Winsemius, MD Heritage Laboratories West Hartford, CT
Thanks to one and all. I now have a better understanding of the situation. ---- Original message ---->Date: Sat, 1 Jan 2011 11:03:59 -0500 >From: David Winsemius <dwinsemius at comcast.net> >Subject: Re: [R] Retrieving Factors with Levels Ordered >To: htr at UDel.Edu >Cc: r-help at r-project.org > > >On Jan 1, 2011, at 12:41 AM, H. T. Reynolds wrote: > >> Hello (and Happy New Year), >> >> When I create a factor with labels in the order I want, write the >> data as a text file, > >Why? What is the reason for this process. > >> and then retrieve them, the factor levels are no longer in the >> proper order. > >Two further ideas to those offered by Turlach and Bolker: >You can name them with leading digits that ascend in the desired order >or >I have seen described ( but not found a fully worked example despite >what I thought was an adequate search) the use of an as() method >which in this instance might apply as.factor with your own level >specification while reading with colClasses. > >