Dear R-users, I have found this not-so-recent post in the archives - http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html - while I was looking for a particular way to reorder factor levels. The question addressed by the author was to know if the read.table function could be modified to order the levels of newly created factors "according to the order that they appear in the data file". Exactly what I am looking for. As there was no reply to this post, I wonder if any move have been made towards the implementation of this suggestion. A quick look at ?read.table tells me that if this option was implemented, it was not in the read.table function... Sebastien PS: I am sorry to post so many messages on the list, but I am learning R (basically by trials & errors ;-) ) and no one around me has even a slight notion about it...
You can create your own class and pass that to read table. In the example below Fld2 is read in with factor levels C, A, B in that order. library(methods) setClass("my.levels") setAs("character", "my.levels", function(from) factor(from, levels = c("C", "A", "B"))) ### test ### Input <- "Fld1 Fld2 10 A 20 B 30 C 40 A " DF <- read.table(textConnection(Input), header = TRUE, colClasses = c("numeric", "my.levels")) str(DF) # or DF <- read.table(textConnection(Input), header = TRUE, colClasses = list(Fld2 = "my.levels")) str(DF) On 8/28/07, S?bastien <pomchip at free.fr> wrote:> Dear R-users, > > I have found this not-so-recent post in the archives - > http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html - while I was > looking for a particular way to reorder factor levels. The question > addressed by the author was to know if the read.table function could be > modified to order the levels of newly created factors "according to the > order that they appear in the data file". Exactly what I am looking for. > As there was no reply to this post, I wonder if any move have been made > towards the implementation of this suggestion. A quick look at > ?read.table tells me that if this option was implemented, it was not in > the read.table function... > > Sebastien > > PS: I am sorry to post so many messages on the list, but I am learning R > (basically by trials & errors ;-) ) and no one around me has even a > slight notion about it... > > ______________________________________________ > R-help at stat.math.ethz.ch mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >
Thanks Gabor, I have two questions: 1- Is there any difference between your code and the following one, with regards to Fld2 ? ### test ### Input <- "Fld1 Fld2 10 A 20 B 30 C 40 A " DF <- read.table(textConnection(Input), header = TRUE) DF$Fld2<-factor(DF$Fld2,levels= c("C", "A", "B"))) 2- do you see any way to bring flexibility to your method ? Because, it looks to me as, at this stage, I have to i) know the order of my levels before I read the table and ii) create one class per factor. My problem is that I am not really working on a specific dataset. My goal is to develop R scripts capable of handling datasets which have various contents but close structures. So, I really need to minimize the quantity of "user-specific" code. Sebastien Gabor Grothendieck a écrit :> You can create your own class and pass that to read table. In > the example below Fld2 is read in with factor levels C, A, B > in that order. > > > library(methods) > setClass("my.levels") > setAs("character", "my.levels", > function(from) factor(from, levels = c("C", "A", "B"))) > > > ### test ### > > Input <- "Fld1 Fld2 > 10 A > 20 B > 30 C > 40 A > " > DF <- read.table(textConnection(Input), header = TRUE, > colClasses = c("numeric", "my.levels")) > str(DF) > # or > DF <- read.table(textConnection(Input), header = TRUE, > colClasses = list(Fld2 = "my.levels")) > str(DF) > > > On 8/28/07, Sébastien <pomchip@free.fr> wrote: > >> Dear R-users, >> >> I have found this not-so-recent post in the archives - >> http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html - while I was >> looking for a particular way to reorder factor levels. The question >> addressed by the author was to know if the read.table function could be >> modified to order the levels of newly created factors "according to the >> order that they appear in the data file". Exactly what I am looking for. >> As there was no reply to this post, I wonder if any move have been made >> towards the implementation of this suggestion. A quick look at >> ?read.table tells me that if this option was implemented, it was not in >> the read.table function... >> >> Sebastien >> >> PS: I am sorry to post so many messages on the list, but I am learning R >> (basically by trials & errors ;-) ) and no one around me has even a >> slight notion about it... >> >> ______________________________________________ >> R-help@stat.math.ethz.ch mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > > >[[alternative HTML version deleted]]
Its not clear from your description what you want. Could you be a bit more specific including an example. On 8/28/07, S?bastien <pomchip at free.fr> wrote:> Thanks Gabor, I have two questions: > > 1- Is there any difference between your code and the following one, with > regards to Fld2 ? > ### test ###Input <- "Fld1 Fld2 10 A 20 B 30 C 40 A " DF <-> read.table(textConnection(Input), header > TRUE)DF$Fld2<-factor(DF$Fld2,levels= c("C", "A", "B")))> 2- do you see any way to bring flexibility to your method ? Because, it > looks to me as, at this stage, I have to i) know the order of my levels > before I read the table and ii) create one class per factor. > My problem is that I am not really working on a specific dataset. My goal is > to develop R scripts capable of handling datasets which have various > contents but close structures. So, I really need to minimize the quantity of > "user-specific" code. > > Sebastien > > Gabor Grothendieck a ?crit : > You can create your own class and pass that to read table. Inthe example> below Fld2 is read in with factor levels C, A, Bin that> order.library(methods) setClass("my.levels") setAs("character",> "my.levels",function(from) factor(from, levels = c("C", "A", "B"))) ###> test ###Input <- "Fld1 Fld2 10 A 20 B 30 C 40 A " DF <-> read.table(textConnection(Input), header = TRUE,colClasses = c("numeric",> "my.levels"))str(DF) # or DF <- read.table(textConnection(Input), header > TRUE, colClasses = list(Fld2 = "my.levels")) str(DF) On 8/28/07,> S?bastien <pomchip at free.fr> wrote:> Dear R-users,I have found this not-so-recent post in the archives> -http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html -> while I waslooking for a particular way to reorder factor levels. The> questionaddressed by the author was to know if the read.table function> could bemodified to order the levels of newly created factors "according to> theorder that they appear in the data file". Exactly what I am looking> for.As there was no reply to this post, I wonder if any move have been> madetowards the implementation of this suggestion. A quick look> at?read.table tells me that if this option was implemented, it was not> inthe read.table function... Sebastien PS: I am sorry to post so many> messages on the list, but I am learning R(basically by trials & errors ;-)> ) and no one around me has even aslight notion about> it...______________________________________________ R-help at stat.math.ethz.ch> mailing > listhttps://stat.ethz.ch/mailman/listinfo/r-help PLEASE do> read the posting guide > http://www.R-project.org/posting-guide.htmland provide> commented, minimal, self-contained, reproducible code.>>
Ok, I cannot send to you one of my dataset since they are confidential. But I can produce a dummy "mini" dataset to illustrate my question. Let''s say I have a csv file with 3 columns and 20 rows which content is reproduced by the following line. > mydata<-data.frame(a=1:20, b=sample(100:200,20,replace=T),c=sample(letters[1:26], 20, replace = T)) > mydata a b c 1 1 176 w 2 2 141 k 3 3 172 r 4 4 182 s 5 5 123 k 6 6 153 p 7 7 176 l 8 8 170 u 9 9 140 z 10 10 194 s 11 11 164 j 12 12 100 j 13 13 127 x 14 14 137 r 15 15 198 d 16 16 173 j 17 17 113 x 18 18 144 w 19 19 198 q 20 20 122 f If I had to read the csv file, I would use something like: mydata<-data.frame(read.table(file="c:/test.csv",header=T)) Now, if you look at mydata$c, the levels are alphabetically ordered. > mydata$c [1] w k r s k p l u z s j j x r d j x w q f Levels: d f j k l p q r s u w x z What I am trying to do is to reorder the levels as to have them in the order they appear in the table, ie Levels: w k r s p l u z j x d q f Again, keep in mind that my script should be used on datasets which content are unknown to me. In my example, I have used letters for mydata$c, but my code may have to handle factors of numeric or character values (I need to transform specific columns of my dataset into factors for plotting purposes). My goal is to let the code scan the content of each factor of my data.frame during or after the read.table step and reorder their levels automatically without having to ask the user to hard-code the level order. In a way, my problem is more related to the way the factor levels are ordered than to the read.table function, although I guess there is a link... Gabor Grothendieck a écrit :> Its not clear from your description what you want. > Could you be a bit more specific including an example. > > On 8/28/07, Sébastien <pomchip@free.fr> wrote: > >> Thanks Gabor, I have two questions: >> >> 1- Is there any difference between your code and the following one, with >> regards to Fld2 ? >> ### test ### >> > > Input <- "Fld1 Fld2 > 10 A > 20 B > 30 C > 40 A > " > DF <- > >> read.table(textConnection(Input), header >> TRUE) >> > > DF$Fld2<-factor(DF$Fld2,levels= c("C", "A", "B"))) > >> 2- do you see any way to bring flexibility to your method ? Because, it >> looks to me as, at this stage, I have to i) know the order of my levels >> before I read the table and ii) create one class per factor. >> My problem is that I am not really working on a specific dataset. My goal is >> to develop R scripts capable of handling datasets which have various >> contents but close structures. So, I really need to minimize the quantity of >> "user-specific" code. >> >> Sebastien >> >> Gabor Grothendieck a écrit : >> You can create your own class and pass that to read table. In >> > the example > >> below Fld2 is read in with factor levels C, A, B >> > in that > >> order. >> > > > library(methods) > setClass("my.levels") > setAs("character", > >> "my.levels", >> > function(from) factor(from, levels = c("C", "A", "B"))) > > > ### > >> test ### >> > > Input <- "Fld1 Fld2 > 10 A > 20 B > 30 C > 40 A > " > DF <- > >> read.table(textConnection(Input), header = TRUE, >> > colClasses = c("numeric", > >> "my.levels")) >> > str(DF) > # or > DF <- read.table(textConnection(Input), header > >> TRUE, >> > colClasses = list(Fld2 = "my.levels")) > str(DF) > > > On 8/28/07, > >> Sébastien <pomchip@free.fr> wrote: >> > > >> Dear R-users, >> > > I have found this not-so-recent post in the archives > >> - >> > http://tolstoy.newcastle.edu.au/R/devel/00a/0291.html - > >> while I was >> > looking for a particular way to reorder factor levels. The > >> question >> > addressed by the author was to know if the read.table function > >> could be >> > modified to order the levels of newly created factors "according to > >> the >> > order that they appear in the data file". Exactly what I am looking > >> for. >> > As there was no reply to this post, I wonder if any move have been > >> made >> > towards the implementation of this suggestion. A quick look > >> at >> > ?read.table tells me that if this option was implemented, it was not > >> in >> > the read.table function... > > Sebastien > > PS: I am sorry to post so many > >> messages on the list, but I am learning R >> > (basically by trials & errors ;-) > >> ) and no one around me has even a >> > slight notion about > >> it... >> > > ______________________________________________ > R-help@stat.math.ethz.ch > >> mailing >> list >> > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do > >> read the posting guide >> http://www.R-project.org/posting-guide.html >> > and provide > >> commented, minimal, self-contained, reproducible code. >> > > > > > > > >[[alternative HTML version deleted]]
Sebastain Does the following work for you? seb <- read.table(file='clipboard', header=T) seb$c [1] w k r s k p l u z s j j x r d j x w q f Levels: d f j k l p q r s u w x z seb$c <- factor(seb$c, levels=unique(seb$c)) seb$c [1] w k r s k p l u z s j j x r d j x w q f Levels: w k r s p l u z j x d q f Peter Alspach> -----Original Message----- > From: r-help-bounces at stat.math.ethz.ch > [mailto:r-help-bounces at stat.math.ethz.ch] On Behalf Of S?bastien > Sent: Wednesday, 29 August 2007 9:00 a.m. > To: Gabor Grothendieck > Cc: R-help > Subject: Re: [R] Factor levels > > Ok, I cannot send to you one of my dataset since they are > confidential. > But I can produce a dummy "mini" dataset to illustrate my question. > Let's say I have a csv file with 3 columns and 20 rows which > content is reproduced by the following line. > > > mydata<-data.frame(a=1:20, > b=sample(100:200,20,replace=T),c=sample(letters[1:26], 20, > replace = T)) > mydata > a b c > 1 1 176 w > 2 2 141 k > 3 3 172 r > 4 4 182 s > 5 5 123 k > 6 6 153 p > 7 7 176 l > 8 8 170 u > 9 9 140 z > 10 10 194 s > 11 11 164 j > 12 12 100 j > 13 13 127 x > 14 14 137 r > 15 15 198 d > 16 16 173 j > 17 17 113 x > 18 18 144 w > 19 19 198 q > 20 20 122 f > > If I had to read the csv file, I would use something like: > mydata<-data.frame(read.table(file="c:/test.csv",header=T)) > > Now, if you look at mydata$c, the levels are alphabetically ordered. > > mydata$c > [1] w k r s k p l u z s j j x r d j x w q f > Levels: d f j k l p q r s u w x z > > What I am trying to do is to reorder the levels as to have > them in the order they appear in the table, ie > Levels: w k r s p l u z j x d q f > > Again, keep in mind that my script should be used on datasets > which content are unknown to me. In my example, I have used > letters for mydata$c, but my code may have to handle factors > of numeric or character values (I need to transform specific > columns of my dataset into factors for plotting purposes). My > goal is to let the code scan the content of each factor of my > data.frame during or after the read.table step and reorder > their levels automatically without having to ask the user to > hard-code the level order. > > In a way, my problem is more related to the way the factor > levels are ordered than to the read.table function, although > I guess there is a link... > > Gabor Grothendieck a ?crit : > > Its not clear from your description what you want
Peter, Gabor: thanks to both of you. This ''unique'' function is what I was looking for ! Peter Alspach a écrit :> Sebastain > > Does the following work for you? > > seb <- read.table(file=''clipboard'', header=T) > seb$c > [1] w k r s k p l u z s j j x r d j x w q f > Levels: d f j k l p q r s u w x z > seb$c <- factor(seb$c, levels=unique(seb$c)) > seb$c > [1] w k r s k p l u z s j j x r d j x w q f > Levels: w k r s p l u z j x d q f > > Peter Alspach > > > >> -----Original Message----- >> From: r-help-bounces@stat.math.ethz.ch >> [mailto:r-help-bounces@stat.math.ethz.ch] On Behalf Of Sébastien >> Sent: Wednesday, 29 August 2007 9:00 a.m. >> To: Gabor Grothendieck >> Cc: R-help >> Subject: Re: [R] Factor levels >> >> Ok, I cannot send to you one of my dataset since they are >> confidential. >> But I can produce a dummy "mini" dataset to illustrate my question. >> Let''s say I have a csv file with 3 columns and 20 rows which >> content is reproduced by the following line. >> >> > mydata<-data.frame(a=1:20, >> b=sample(100:200,20,replace=T),c=sample(letters[1:26], 20, >> replace = T)) > mydata >> a b c >> 1 1 176 w >> 2 2 141 k >> 3 3 172 r >> 4 4 182 s >> 5 5 123 k >> 6 6 153 p >> 7 7 176 l >> 8 8 170 u >> 9 9 140 z >> 10 10 194 s >> 11 11 164 j >> 12 12 100 j >> 13 13 127 x >> 14 14 137 r >> 15 15 198 d >> 16 16 173 j >> 17 17 113 x >> 18 18 144 w >> 19 19 198 q >> 20 20 122 f >> >> If I had to read the csv file, I would use something like: >> mydata<-data.frame(read.table(file="c:/test.csv",header=T)) >> >> Now, if you look at mydata$c, the levels are alphabetically ordered. >> > mydata$c >> [1] w k r s k p l u z s j j x r d j x w q f >> Levels: d f j k l p q r s u w x z >> >> What I am trying to do is to reorder the levels as to have >> them in the order they appear in the table, ie >> Levels: w k r s p l u z j x d q f >> >> Again, keep in mind that my script should be used on datasets >> which content are unknown to me. In my example, I have used >> letters for mydata$c, but my code may have to handle factors >> of numeric or character values (I need to transform specific >> columns of my dataset into factors for plotting purposes). My >> goal is to let the code scan the content of each factor of my >> data.frame during or after the read.table step and reorder >> their levels automatically without having to ask the user to >> hard-code the level order. >> >> In a way, my problem is more related to the way the factor >> levels are ordered than to the read.table function, although >> I guess there is a link... >> >> Gabor Grothendieck a écrit : >> >>> Its not clear from your description what you want >>> > >[[alternative HTML version deleted]]