Brian Feeny
2012-Nov-24 04:42 UTC
[R] Building factors across two columns, is this possible?
I am trying to make it so two columns with similar data use the same internal numbers for same factors, here is the example:> read.csv("test.csv",header =FALSE,sep=",")V1 V2 V3 1 sun moon stars 2 stars moon sun 3 cat dog catdog 4 dog moon sun 5 bird plane superman 6 1000 dog 2000> data <- read.csv("test.csv",header =FALSE,sep=",") > str(data)'data.frame': 6 obs. of 3 variables: $ V1: Factor w/ 6 levels "1000","bird",..: 6 5 3 4 2 1 $ V2: Factor w/ 3 levels "dog","moon","plane": 2 2 1 2 3 1 $ V3: Factor w/ 5 levels "2000","catdog",..: 3 4 2 4 5 1> as.numeric(data$V1)[1] 6 5 3 4 2 1> as.numeric(data$V2)[1] 2 2 1 2 3 1> as.factor(data$V1)[1] sun stars cat dog bird 1000 Levels: 1000 bird cat dog stars sun> as.factor(data$V2)[1] moon moon dog moon plane dog Levels: dog moon plane So notice "dog" is 4 in V1, yet its 1 in V2. Is there a way, either on import, or after, to have factors computed for both columns and assigned the same internal values? Brian
Brian Feeny
2012-Nov-24 07:33 UTC
[R] Building factors across two columns, is this possible?
To clarify on my previous post, here is a representation of what I am trying to accomplish: I would like every unique value in either column to be assigned a number so like so: V1 V2 V3 1 sun moon stars 2 stars moon sun 3 cat dog catdog 4 dog moon sun 5 bird plane superman 6 1000 dog 2000 Level Value sun -> 0 stars -> 1 cat -> 2 dog -> 3 bird -> 4 1000 -> 5 moon -> 6 plane -> 7 catdog -> 8 superman -> 9 2000 -> 10 etc etc so internally its represented as: V1 V2 V3 1 0 6 1 2 1 6 0 3 2 3 8 4 3 6 0 5 4 7 9 6 5 3 10 does this make sense? I am hoping there is a way to accomplish this. Brian On Nov 23, 2012, at 11:42 PM, Brian Feeny <bfeeny at mac.com> wrote:> > I am trying to make it so two columns with similar data use the same internal numbers for same factors, here is the example: > >> read.csv("test.csv",header =FALSE,sep=",") > V1 V2 V3 > 1 sun moon stars > 2 stars moon sun > 3 cat dog catdog > 4 dog moon sun > 5 bird plane superman > 6 1000 dog 2000 >> data <- read.csv("test.csv",header =FALSE,sep=",") >> str(data) > 'data.frame': 6 obs. of 3 variables: > $ V1: Factor w/ 6 levels "1000","bird",..: 6 5 3 4 2 1 > $ V2: Factor w/ 3 levels "dog","moon","plane": 2 2 1 2 3 1 > $ V3: Factor w/ 5 levels "2000","catdog",..: 3 4 2 4 5 1 > >> as.numeric(data$V1) > [1] 6 5 3 4 2 1 >> as.numeric(data$V2) > [1] 2 2 1 2 3 1 >> as.factor(data$V1) > [1] sun stars cat dog bird 1000 > Levels: 1000 bird cat dog stars sun >> as.factor(data$V2) > [1] moon moon dog moon plane dog > Levels: dog moon plane > > > So notice "dog" is 4 in V1, yet its 1 in V2. Is there a way, either on import, or after, to have factors computed for both columns and assigned > the same internal values? > > Brian >
David Winsemius
2012-Nov-24 17:35 UTC
[R] Building factors across two columns, is this possible?
On Nov 23, 2012, at 8:42 PM, Brian Feeny wrote:> > I am trying to make it so two columns with similar data use the same > internal numbers for same factors, here is the example: > >> read.csv("test.csv",header =FALSE,sep=",") > V1 V2 V3 > 1 sun moon stars > 2 stars moon sun > 3 cat dog catdog > 4 dog moon sun > 5 bird plane superman > 6 1000 dog 2000 >> data <- read.csv("test.csv",header =FALSE,sep=",") >> str(data) > 'data.frame': 6 obs. of 3 variables: > $ V1: Factor w/ 6 levels "1000","bird",..: 6 5 3 4 2 1 > $ V2: Factor w/ 3 levels "dog","moon","plane": 2 2 1 2 3 1 > $ V3: Factor w/ 5 levels "2000","catdog",..: 3 4 2 4 5 1 > >> as.numeric(data$V1) > [1] 6 5 3 4 2 1 >> as.numeric(data$V2) > [1] 2 2 1 2 3 1 >> as.factor(data$V1) > [1] sun stars cat dog bird 1000 > Levels: 1000 bird cat dog stars sun >> as.factor(data$V2) > [1] moon moon dog moon plane dog > Levels: dog moon plane > > > So notice "dog" is 4 in V1, yet its 1 in V2. Is there a way, either > on import, or after, to have factors computed for both columns and > assigned > the same internal values?> dat[] <- lapply(dat, function(x) factor(as.character(x), levels= levels(unlist(dat)) ) ) > dat V1 V2 V3 1 sun moon stars 2 stars moon sun 3 cat dog catdog 4 dog moon sun 5 bird plane superman 6 1000 dog 2000 > levels(dat[[1]]) [1] "1000" "bird" "cat" "dog" "stars" "sun" [7] "moon" "plane" "2000" "catdog" "superman" I see your "clarification". Reordering the representation can be done with : levels(dat) <- <character vector> -- David Winsemius, MD Alameda, CA, USA