anteneh asmare
2022-Jun-14 11:28 UTC
[R] Create a categorical variable using the deciles of data
I want Create a categorical variable using the deciles of the following data frame to divide the individuals into 10 groups equally. I try the following codes data_catigocal<-data.frame(c(1:50000)) # create categorical vector using deciles group_vector <- c('0-10','11-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90','91-100') # Add categorical variable to the data_catigocal data_catigocal$decile <- factor(group_vector) # print data frame data_catigocal can any one help me with the r code Kind regards, Hana
Ebert,Timothy Aaron
2022-Jun-14 12:28 UTC
[R] Create a categorical variable using the deciles of data
Hana, the "right" answer depends on exactly what you need. Here are three correct solutions. They use the same basic strategy to give different results. There are also other approaches in R to get the same outcome. You could use data_catigocal[i,j] and some for loops. size1 <-50000 ngroup <- 10 # note that size1 must be evenly divisible by ngroup group_size <- size1/ngroup data_catigocal <-data.frame(c(1:size1)) data_categorical1<-data_catigocal # create categorical vector using deciles group_vector <- c('0-10','11-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90','91-100') data_categorical1$group_vn <-rep(group_vector,group_size) option2 <- rep(group_vector, group_size) option2 <- sort(option2, decreasing=FALSE) data_categorical2 <- cbind(option2, data_catigocal) option3 <- rep(group_vector, group_size) option3a <- sample(option3, size1, replace=FALSE) data_categorical3 <- cbind(option3a, data_catigocal) Tim -----Original Message----- From: R-help <r-help-bounces at r-project.org> On Behalf Of anteneh asmare Sent: Tuesday, June 14, 2022 7:29 AM To: r-help at r-project.org Subject: [R] Create a categorical variable using the deciles of data [External Email] I want Create a categorical variable using the deciles of the following data frame to divide the individuals into 10 groups equally. I try the following codes data_catigocal<-data.frame(c(1:50000)) # create categorical vector using deciles group_vector <- c('0-10','11-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90','91-100') # Add categorical variable to the data_catigocal data_catigocal$decile <- factor(group_vector) # print data frame data_catigocal can any one help me with the r code Kind regards, Hana ______________________________________________ R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=iJ1M9ZDgTrZDuxyw_CUg03Mb6JmtrOaSF0JqAl-1pdmgbKG3AWiI6hMbv9LVOjKN&s=eUb_8T4KZRbFW_poDuhkWwPvNKQdkI6fm0MMTsOyh-A&ePLEASE do read the posting guide https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=iJ1M9ZDgTrZDuxyw_CUg03Mb6JmtrOaSF0JqAl-1pdmgbKG3AWiI6hMbv9LVOjKN&s=tnk4qRX6T6SZuapvkrNEZOtHmOVlKGS-02yHEzajqS8&eand provide commented, minimal, self-contained, reproducible code.
Rui Barradas
2022-Jun-14 13:00 UTC
[R] Create a categorical variable using the deciles of data
Hello, I have recreated the data.frame giving the column a name. Here are two ways, both based on ?pretty: data_catigocal <- data.frame(X = 1:50000) pretty(data_catigocal$X, n = 10) #> [1] 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000 1. Use ?cut to create a factor with 10 levels then assign the labels. group_vector <- c('0-10','11-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90','91-100') data_catigocal$decile <- with(data_catigocal, cut(X, breaks = pretty(X, n = 10), include.lowest = TRUE)) data_catigocal$decile <- factor(data_catigocal$decile, labels = group_vector) head(data_catigocal) #> X decile #> 1 1 0-10 #> 2 2 0-10 #> 3 3 0-10 #> 4 4 0-10 #> 5 5 0-10 #> 6 6 0-10 tail(data_catigocal) #> X decile #> 49995 49995 91-100 #> 49996 49996 91-100 #> 49997 49997 91-100 #> 49998 49998 91-100 #> 49999 49999 91-100 #> 50000 50000 91-100 2. Use ?findInterval to bin the data and coerce to factor with the appropriate levels. data_catigocal$decile <- findInterval(data_catigocal$X, pretty(data_catigocal$X, n = 10), rightmost.closed = TRUE) data_catigocal$decile <- factor(data_catigocal$decile, labels = group_vector) The results are the same. Hope this helps, Rui Barradas ?s 12:28 de 14/06/2022, anteneh asmare escreveu:> I want Create a categorical variable using the deciles of the > following data frame to divide the individuals into 10 groups equally. > I try the following codes > data_catigocal<-data.frame(c(1:50000)) > # create categorical vector using deciles > group_vector <- > c('0-10','11-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90','91-100') > # Add categorical variable to the data_catigocal > data_catigocal$decile <- factor(group_vector) > # print data frame > data_catigocal > > can any one help me with the r code > Kind regards, > Hana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code.
Richard O'Keefe
2022-Jun-14 13:07 UTC
[R] Create a categorical variable using the deciles of data
Can you explain why you are not using ?quantile to find the deciles then ?cut to construct the factor? What have I misunderstood? On Tue, 14 Jun 2022 at 23:29, anteneh asmare <hanatezera at gmail.com> wrote:> I want Create a categorical variable using the deciles of the > following data frame to divide the individuals into 10 groups equally. > I try the following codes > data_catigocal<-data.frame(c(1:50000)) > # create categorical vector using deciles > group_vector <- > > c('0-10','11-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90','91-100') > # Add categorical variable to the data_catigocal > data_catigocal$decile <- factor(group_vector) > # print data frame > data_catigocal > > can any one help me with the r code > Kind regards, > Hana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]