Dimitri Liakhovitski
2011-May-23 15:48 UTC
[R] weird problem - R is not finding the data for the factor level present in the data
Sorry for no code - but it's a more of a general question. I have read in a data frame ("|"-delimited, .txt). daily<-read.table(file="filename.txt",sep="|",header=T) One of the variables is a factor with 110 levels:>str(daily$dma_id)Factor w/ 110 levels "500","501","503",... 108 levels of this factor happen to be numbers "500", "501", "503", ... "880","881" But the last 2 levels are strings:>levels(daily$dma_id)[109:110][1] "OH1054" "PA2207" I checked in the raw data file (.txt) that there are no spaces in these last 2 levels, that there is nothing weird. There is nothing. When I do the following with any level of that factor, I get the data I am expecting:>daily$dma_id[daily$dma_id %in% levels(daily$dma_id)[108]][1] "881" "881" "881" - etc. But when I do the same for the last two levels, I get nothing, for example:>daily$dma_id[daily$dma_id %in% "OH1054"] # - or: >daily$dma_id[daily$dma_id %in% levels(daily$dma_id)[109]]factor(0) As a result I can't rename those levels. It's not a real problem - I can do replace in the raw txt file. But still, why is it happening? Thank you! -- Dimitri Liakhovitski Ninah Consulting www.ninah.com
Dimitri Liakhovitski
2011-May-23 15:54 UTC
[R] weird problem - R is not finding the data for the factor level present in the data
Sorry - please ignore. I've rerun-it from scatch, and it worked this time! D. On Mon, May 23, 2011 at 11:48 AM, Dimitri Liakhovitski <dimitri.liakhovitski at gmail.com> wrote:> Sorry for no code - but it's a more of a general question. > I have read in a data frame ("|"-delimited, .txt). > daily<-read.table(file="filename.txt",sep="|",header=T) > > One of the variables is a factor with 110 levels: >>str(daily$dma_id) > ?Factor w/ 110 levels "500","501","503",... > > 108 levels of this factor happen to be numbers "500", "501", "503", > ... "880","881" > But the last 2 levels are strings: > >>levels(daily$dma_id)[109:110] > [1] "OH1054" "PA2207" > > I checked in the raw data file (.txt) that there are no spaces in > these last 2 levels, that there is nothing weird. There is nothing. > > When I do the following with any level of that factor, I get the data > I am expecting: >>daily$dma_id[daily$dma_id %in% levels(daily$dma_id)[108]] > [1] "881" "881" "881" - etc. > > But when I do the same for the last two levels, I get nothing, for example: >>daily$dma_id[daily$dma_id %in% "OH1054"] ?# - or: >>daily$dma_id[daily$dma_id %in% levels(daily$dma_id)[109]] > factor(0) > > As a result I can't rename those levels. > > It's not a real problem - I can do replace in the raw txt file. But > still, why is it happening? > > Thank you! > > > > -- > Dimitri Liakhovitski > Ninah Consulting > www.ninah.com >-- Dimitri Liakhovitski Ninah Consulting www.ninah.com