bgreen at dyson.brisnet.org.au
2006-Sep-28 02:27 UTC
[R] recode problem - unexplained values
I am hoping for some advice regarding the difficulties I have been having recoding variables which are contained in a csv file. Table 1 (below) shows there are two types of blanks - as reported in the first two columns. I am using windows XP & the latets version of R. When blanks cells are replaced with a value of n using syntax: > affect [affect==""] <- "n" there are still 3 blank values (Table 2). When as.numeric is applied, this also causes problems because values of 2,3 & 4 are generated rather than just 1 & 2. TABLE 1 table(group,actions) actions group n y 1 100 2 0 3 2 30 1 1 0 3 24 0 0 0 TABLE 2> table(group,actions)actions group n y 1 0 2 100 3 2 0 1 31 0 3 0 0 24 0 Below is another example - for some reason there are 2 types of 'aobh' values.> table(group, type)type group aobh aobh gbh m uw 1 104 1 0 0 0 2 0 0 15 0 17 3 0 0 0 24 0 Any assistance is much appreciated, Bob Green
I can propose a strategy. This example shows that there are different types of blanks when you look at character data. as.character(c("", " ", " ", " ")) Your test for "" found only one of them. Look at the data as read.csv produces it. That will probably give you some clues. mydata <- read.csv("filename") mydata as.character(mydata) Rich
On Thu, 2006-09-28 at 12:27 +1000, bgreen at dyson.brisnet.org.au wrote:> I am hoping for some advice regarding the difficulties I have been having > recoding variables which are contained in a csv file. Table 1 (below) > shows there are two types of blanks - as reported in the first two > columns. I am using windows XP & the latets version of R. > > When blanks cells are replaced with a value of n using syntax: > affect > [affect==""] <- "n" > there are still 3 blank values (Table 2). When as.numeric is applied, > this also causes problems because values of 2,3 & 4 are generated rather > than just 1 & 2. > > TABLE 1 > > table(group,actions) > actions > group n y > 1 100 2 0 3 > 2 30 1 1 0 > 3 24 0 0 0 > > > > TABLE 2 > > > table(group,actions) > actions > group n y > 1 0 2 100 3 > 2 0 1 31 0 > 3 0 0 24 0 > > > Below is another example - for some reason there are 2 types of 'aobh' > values. > > > > table(group, type) > type > group aobh aobh gbh m uw > 1 104 1 0 0 0 > 2 0 0 15 0 17 > 3 0 0 0 24 0 > > > Any assistance is much appreciated, > > > Bob GreenBob, A quick heads up, which is the presumption that "aobh" and "aobh " are different values simply as a consequence of leading/trailing spaces in the source data file within the delimited fields. This is also the likely reason for there being multiple missing/blank values in your imported data set. Presuming that you used one of the read.table() family functions (ie. read.csv() ), take note of the 'strip.white' argument in ?read.table, which defaults to FALSE. If you change it to TRUE, the function will strip leading and trailing blanks, likely resolving this issue. HTH, Marc Schwartz