Hi all, Simple question which i thought i had the answer but it isnt so simple for some reason. I am sure someone can easily help. I would like to categorize the values in NP into 1 of the five values in "Per", with the last category("4") representing values >=4(hence 4:max(NP)). The problem is that R is reading max(NP) as multiple values instead of range so the lengths of the labels and the breaks are not matching. Suggestions? Per <- c("NA", "1", "2", "3","4") NP=c(1 ,1 ,2 ,1, 1 ,2 ,2 ,1 ,4 ,1 ,0 ,5 ,3 ,3 ,1 ,5 ,3, 5, 1, 6, 1, 2, 2, 2, 4, 4, 1, 2, 1, 3, 3, 1 ,2 ,2 ,1 ,2, 1, 2, 2, 3, 1, 1, 4, 4, 1, 1, 1, 2, 2, 2) Person_CAT <- cut(NP, breaks=c(0,1,2,3,4:max(NP)), labels=Per) -- View this message in context: http://www.nabble.com/truncating-values-into-separate-categories-tp24749046p24749046.html Sent from the R help mailing list archive at Nabble.com.
Bill.Venables at csiro.au
2009-Jul-31 01:31 UTC
[R] truncating values into separate categories
Here is a suggestion:> Per <- c("NA", "1", "2", "3","4") > NP <- c(1, 1, 2, 1, 1, 2, 2, 1, 4, 1, 0, 5,+ 3, 3, 1, 5, 3, 5, 1, 6, 1, 2, 2, 2, + 4, 4, 1, 2, 1, 3, 3, 1, 2, 2, 1, 2, 1, 2, + 2, 3, 1, 1, 4, 4, 1, 1, 1, 2, 2, 2)> Person_CAT <- cut(NP, breaks = c(0:4, Inf)-0.5, labels = Per) > table(Person_CAT)Person_CAT NA 1 2 3 4 1 19 15 6 9>You should be aware, though, that items corresponding to the level "NA" will NOT be treated as missing. Bill Venables http://www.cmis.csiro.au/bill.venables/ -----Original Message----- From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] On Behalf Of PDXRugger Sent: Friday, 31 July 2009 9:54 AM To: r-help at r-project.org Subject: [R] truncating values into separate categories Hi all, Simple question which i thought i had the answer but it isnt so simple for some reason. I am sure someone can easily help. I would like to categorize the values in NP into 1 of the five values in "Per", with the last category("4") representing values >=4(hence 4:max(NP)). The problem is that R is reading max(NP) as multiple values instead of range so the lengths of the labels and the breaks are not matching. Suggestions? Per <- c("NA", "1", "2", "3","4") NP=c(1 ,1 ,2 ,1, 1 ,2 ,2 ,1 ,4 ,1 ,0 ,5 ,3 ,3 ,1 ,5 ,3, 5, 1, 6, 1, 2, 2, 2, 4, 4, 1, 2, 1, 3, 3, 1 ,2 ,2 ,1 ,2, 1, 2, 2, 3, 1, 1, 4, 4, 1, 1, 1, 2, 2, 2) Person_CAT <- cut(NP, breaks=c(0,1,2,3,4:max(NP)), labels=Per) -- View this message in context: http://www.nabble.com/truncating-values-into-separate-categories-tp24749046p24749046.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help at r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
It appears that your difficulty lies in miscounting the number of intervals.> cut(NP, breaks=c(0,1,2,3,4,max(NP)))[1] (0,1] (0,1] (1,2] (0,1] (0,1] (1,2] (1,2] (0,1] (3,4] (0,1] <NA> (4,6] (2,3] (2,3] (0,1] [16] (4,6] (2,3] (4,6] (0,1] (4,6] (0,1] (1,2] (1,2] (1,2] (3,4] (3,4] (0,1] (1,2] (0,1] (2,3] [31] (2,3] (0,1] (1,2] (1,2] (0,1] (1,2] (0,1] (1,2] (1,2] (2,3] (0,1] (0,1] (3,4] (3,4] (0,1] [46] (0,1] (0,1] (1,2] (1,2] (1,2] Levels: (0,1] (1,2] (2,3] (3,4] (4,6]> cut(NP, breaks=c(0,1,2,3,max(NP)),labels=c("1","2","3","4+"))[1] 1 1 2 1 1 2 2 1 4+ 1 <NA> 4+ 3 3 1 4+ 3 4+ [19] 1 4+ 1 2 2 2 4+ 4+ 1 2 1 3 3 1 2 2 1 2 [37] 1 2 2 3 1 1 4+ 4+ 1 1 1 2 2 2 Levels: 1 2 3 4+> a=cut(NP, breaks=c(0,1,2,3,max(NP)),labels=c("1","2","3","4+")) > table(a,exclude=NULL)a 1 2 3 4+ <NA> 19 15 6 9 1 Generally it is better to let R keep track of the NA's for you. albyn Quoting PDXRugger <J_R_36 at hotmail.com>:> > Hi all, > Simple question which i thought i had the answer but it isnt so simple for > some reason. I am sure someone can easily help. I would like to categorize > the values in NP into 1 of the five values in "Per", with the last > category("4") representing values >=4(hence 4:max(NP)). The problem is that > R is reading max(NP) as multiple values instead of range so the lengths of > the labels and the breaks are not matching. Suggestions? > > Per <- c("NA", "1", "2", "3","4") > > NP=c(1 ,1 ,2 ,1, 1 ,2 ,2 ,1 ,4 ,1 ,0 ,5 ,3 ,3 ,1 ,5 ,3, 5, 1, 6, 1, 2, 2, 2, > 4, 4, 1, 2, 1, 3, 3, 1 ,2 ,2 ,1 ,2, 1, 2, > 2, 3, 1, 1, 4, 4, 1, 1, 1, 2, 2, 2) > > Person_CAT <- cut(NP, breaks=c(0,1,2,3,4:max(NP)), labels=Per) > > -- > View this message in context: > http://www.nabble.com/truncating-values-into-separate-categories-tp24749046p24749046.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >
I must apoligize, as i want clear of what i wanted to occur. i dont want to count the occurences but rather recode them. I am trying to replace all of the values with the new coded values in Person_CAT. SO NP <- c(1, 1, 2, 1, 1, 2, 2, 1, 4, 1, 0, 5, + 3, 3, 1, 5, 3, 5, 1, 6, 1, 2, 2, 2, + 4, 4, 1, 2, 1, 3, 3, 1, 2, 2, 1, 2, 1, 2, + 2, 3, 1, 1, 4, 4, 1, 1, 1, 2, 2, 2) and Person_CAT: 1, 1, 2, 1, 1, 2, 2, 1, 4, 1, NA, 4..... and so on. This task would easily be done in SPSS but i am trying to automate it using R. I hope this is more clear, Bill.Venables wrote:> > Here is a suggestion: > >> Per <- c("NA", "1", "2", "3","4") >> NP <- c(1, 1, 2, 1, 1, 2, 2, 1, 4, 1, 0, 5, > + 3, 3, 1, 5, 3, 5, 1, 6, 1, 2, 2, 2, > + 4, 4, 1, 2, 1, 3, 3, 1, 2, 2, 1, 2, 1, 2, > + 2, 3, 1, 1, 4, 4, 1, 1, 1, 2, 2, 2) >> Person_CAT <- cut(NP, breaks = c(0:4, Inf)-0.5, labels = Per) >> table(Person_CAT) > Person_CAT > NA 1 2 3 4 > 1 19 15 6 9 >> > > You should be aware, though, that items corresponding to the level "NA" > will NOT be treated as missing. > > > Bill Venables > http://www.cmis.csiro.au/bill.venables/ > > > -----Original Message----- > From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org] > On Behalf Of PDXRugger > Sent: Friday, 31 July 2009 9:54 AM > To: r-help at r-project.org > Subject: [R] truncating values into separate categories > > > Hi all, > Simple question which i thought i had the answer but it isnt so simple > for > some reason. I am sure someone can easily help. I would like to > categorize > the values in NP into 1 of the five values in "Per", with the last > category("4") representing values >=4(hence 4:max(NP)). The problem is > that > R is reading max(NP) as multiple values instead of range so the lengths of > the labels and the breaks are not matching. Suggestions? > > Per <- c("NA", "1", "2", "3","4") > > NP=c(1 ,1 ,2 ,1, 1 ,2 ,2 ,1 ,4 ,1 ,0 ,5 ,3 ,3 ,1 ,5 ,3, 5, 1, 6, 1, 2, 2, > 2, > 4, 4, 1, 2, 1, 3, 3, 1 ,2 ,2 ,1 ,2, 1, 2, > 2, 3, 1, 1, 4, 4, 1, 1, 1, 2, 2, 2) > > Person_CAT <- cut(NP, breaks=c(0,1,2,3,4:max(NP)), labels=Per) > > -- > View this message in context: > http://www.nabble.com/truncating-values-into-separate-categories-tp24749046p24749046.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help at r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > >-- View this message in context: http://www.nabble.com/truncating-values-into-separate-categories-tp24749046p24761455.html Sent from the R help mailing list archive at Nabble.com.
On Jul 31, 2009, at 2:55 PM, PDXRugger wrote:> > I must apoligize, as i want clear of what i wanted to occur. i dont > want to > count the occurences but rather recode them. I am trying to replace > all of > the values with the new coded values in Person_CAT. SO NP <- c(1, > 1, 2, > 1, 1, 2, 2, 1, 4, 1, 0, 5, > + 3, 3, 1, 5, 3, 5, 1, 6, 1, 2, 2, 2, > + 4, 4, 1, 2, 1, 3, 3, 1, 2, 2, 1, 2, 1, 2, > + 2, 3, 1, 1, 4, 4, 1, 1, 1, 2, 2, 2) > > > > > and Person_CAT: 1, 1, 2, 1, 1, 2, 2, 1, 4, 1, NA, 4..... and so on. > This > task would easily be done in SPSS but i am trying to automate it > using R. I > hope this is more clear,Perhaps: ?cut #with special attention to the "right" parameter which is set to TRUE by default. > per_Cat <- cut(NP, breaks= c(1:4, Inf), right= FALSE) > per_Cat [1] [1,2) [1,2) [2,3) [1,2) [1,2) [2,3) [2,3) [1,2) [4,Inf) [1,2) <NA> [4,Inf) [13] [3,4) [3,4) [1,2) [4,Inf) [3,4) [4,Inf) [1,2) [4,Inf) [1,2) [2,3) [2,3) [2,3) [25] [4,Inf) [4,Inf) [1,2) [2,3) [1,2) [3,4) [3,4) [1,2) [2,3) [2,3) [1,2) [2,3) [37] [1,2) [2,3) [2,3) [3,4) [1,2) [1,2) [4,Inf) [4,Inf) [1,2) [1,2) [1,2) [2,3) [49] [2,3) [2,3) Levels: [1,2) [2,3) [3,4) [4,Inf) > Per <- c( "1", "2", "3","4") > levels(per_Cat) <- Per > per_Cat [1] 1 1 2 1 1 2 2 1 4 1 <NA> 4 3 3 1 4 3 4 1 4 [21] 1 2 2 2 4 4 1 2 1 3 3 1 2 2 1 2 1 2 2 3 [41] 1 1 4 4 1 1 1 2 2 2 Levels: 1 2 3 4> > > > > Bill.Venables wrote: >> >> Here is a suggestion: >> >>> Per <- c("NA", "1", "2", "3","4") >>> NP <- c(1, 1, 2, 1, 1, 2, 2, 1, 4, 1, 0, 5, >> + 3, 3, 1, 5, 3, 5, 1, 6, 1, 2, 2, 2, >> + 4, 4, 1, 2, 1, 3, 3, 1, 2, 2, 1, 2, 1, 2, >> + 2, 3, 1, 1, 4, 4, 1, 1, 1, 2, 2, 2) >>> Person_CAT <- cut(NP, breaks = c(0:4, Inf)-0.5, labels = Per) >>> table(Person_CAT) >> Person_CAT >> NA 1 2 3 4 >> 1 19 15 6 9 >>> >> >> You should be aware, though, that items corresponding to the level >> "NA" >> will NOT be treated as missing. >> >> >> Bill Venables >> http://www.cmis.csiro.au/bill.venables/ >> >> >> -----Original Message----- >> From: r-help-bounces at r-project.org [mailto:r-help-bounces at r-project.org >> ] >> On Behalf Of PDXRugger >> Sent: Friday, 31 July 2009 9:54 AM >> To: r-help at r-project.org >> Subject: [R] truncating values into separate categories >> >> >> Hi all, >> Simple question which i thought i had the answer but it isnt so >> simple >> for >> some reason. I am sure someone can easily help. I would like to >> categorize >> the values in NP into 1 of the five values in "Per", with the last >> category("4") representing values >=4(hence 4:max(NP)). The >> problem is >> that >> R is reading max(NP) as multiple values instead of range so the >> lengths of >> the labels and the breaks are not matching. Suggestions? >> >> Per <- c("NA", "1", "2", "3","4") >> >> NP=c(1 ,1 ,2 ,1, 1 ,2 ,2 ,1 ,4 ,1 ,0 ,5 ,3 ,3 ,1 ,5 ,3, 5, 1, 6, 1, >> 2, 2, >> 2, >> 4, 4, 1, 2, 1, 3, 3, 1 ,2 ,2 ,1 ,2, 1, 2, >> 2, 3, 1, 1, 4, 4, 1, 1, 1, 2, 2, 2) >> >> Person_CAT <- cut(NP, breaks=c(0,1,2,3,4:max(NP)), labels=Per) >> >> --David Winsemius, MD Heritage Laboratories West Hartford, CT