anteneh asmare
2022-Jun-14 11:28 UTC
[R] Create a categorical variable using the deciles of data
I want Create a categorical variable using the deciles of the
following data frame to divide the individuals into 10 groups equally.
I try the following codes
data_catigocal<-data.frame(c(1:50000))
# create categorical vector using deciles
group_vector <-
c('0-10','11-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90','91-100')
# Add categorical variable to the data_catigocal
data_catigocal$decile <- factor(group_vector)
# print data frame
data_catigocal
can any one help me with the r code
Kind regards,
Hana
Ebert,Timothy Aaron
2022-Jun-14 12:28 UTC
[R] Create a categorical variable using the deciles of data
Hana, the "right" answer depends on exactly what you need. Here are
three correct solutions. They use the same basic strategy to give different
results. There are also other approaches in R to get the same outcome. You could
use data_catigocal[i,j] and some for loops.
size1 <-50000
ngroup <- 10 # note that size1 must be evenly divisible by ngroup
group_size <- size1/ngroup
data_catigocal <-data.frame(c(1:size1))
data_categorical1<-data_catigocal
# create categorical vector using deciles
group_vector <-
c('0-10','11-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90','91-100')
data_categorical1$group_vn <-rep(group_vector,group_size)
option2 <- rep(group_vector, group_size)
option2 <- sort(option2, decreasing=FALSE)
data_categorical2 <- cbind(option2, data_catigocal)
option3 <- rep(group_vector, group_size)
option3a <- sample(option3, size1, replace=FALSE)
data_categorical3 <- cbind(option3a, data_catigocal)
Tim
-----Original Message-----
From: R-help <r-help-bounces at r-project.org> On Behalf Of anteneh asmare
Sent: Tuesday, June 14, 2022 7:29 AM
To: r-help at r-project.org
Subject: [R] Create a categorical variable using the deciles of data
[External Email]
I want Create a categorical variable using the deciles of the following data
frame to divide the individuals into 10 groups equally.
I try the following codes
data_catigocal<-data.frame(c(1:50000))
# create categorical vector using deciles group_vector <-
c('0-10','11-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90','91-100')
# Add categorical variable to the data_catigocal data_catigocal$decile <-
factor(group_vector) # print data frame data_catigocal
can any one help me with the r code
Kind regards,
Hana
______________________________________________
R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://urldefense.proofpoint.com/v2/url?u=https-3A__stat.ethz.ch_mailman_listinfo_r-2Dhelp&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=iJ1M9ZDgTrZDuxyw_CUg03Mb6JmtrOaSF0JqAl-1pdmgbKG3AWiI6hMbv9LVOjKN&s=eUb_8T4KZRbFW_poDuhkWwPvNKQdkI6fm0MMTsOyh-A&ePLEASE
do read the posting guide
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.R-2Dproject.org_posting-2Dguide.html&d=DwICAg&c=sJ6xIWYx-zLMB3EPkvcnVg&r=9PEhQh2kVeAsRzsn7AkP-g&m=iJ1M9ZDgTrZDuxyw_CUg03Mb6JmtrOaSF0JqAl-1pdmgbKG3AWiI6hMbv9LVOjKN&s=tnk4qRX6T6SZuapvkrNEZOtHmOVlKGS-02yHEzajqS8&eand
provide commented, minimal, self-contained, reproducible code.
Rui Barradas
2022-Jun-14 13:00 UTC
[R] Create a categorical variable using the deciles of data
Hello,
I have recreated the data.frame giving the column a name.
Here are two ways, both based on ?pretty:
data_catigocal <- data.frame(X = 1:50000)
pretty(data_catigocal$X, n = 10)
#> [1] 0 5000 10000 15000 20000 25000 30000 35000 40000 45000 50000
1. Use ?cut to create a factor with 10 levels then assign the labels.
group_vector <-
c('0-10','11-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90','91-100')
data_catigocal$decile <- with(data_catigocal, cut(X, breaks = pretty(X,
n = 10), include.lowest = TRUE))
data_catigocal$decile <- factor(data_catigocal$decile, labels =
group_vector)
head(data_catigocal)
#> X decile
#> 1 1 0-10
#> 2 2 0-10
#> 3 3 0-10
#> 4 4 0-10
#> 5 5 0-10
#> 6 6 0-10
tail(data_catigocal)
#> X decile
#> 49995 49995 91-100
#> 49996 49996 91-100
#> 49997 49997 91-100
#> 49998 49998 91-100
#> 49999 49999 91-100
#> 50000 50000 91-100
2. Use ?findInterval to bin the data and coerce to factor with the
appropriate levels.
data_catigocal$decile <- findInterval(data_catigocal$X,
pretty(data_catigocal$X, n = 10), rightmost.closed = TRUE)
data_catigocal$decile <- factor(data_catigocal$decile, labels =
group_vector)
The results are the same.
Hope this helps,
Rui Barradas
?s 12:28 de 14/06/2022, anteneh asmare escreveu:> I want Create a categorical variable using the deciles of the
> following data frame to divide the individuals into 10 groups equally.
> I try the following codes
> data_catigocal<-data.frame(c(1:50000))
> # create categorical vector using deciles
> group_vector <-
>
c('0-10','11-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90','91-100')
> # Add categorical variable to the data_catigocal
> data_catigocal$decile <- factor(group_vector)
> # print data frame
> data_catigocal
>
> can any one help me with the r code
> Kind regards,
> Hana
>
> ______________________________________________
> R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
Richard O'Keefe
2022-Jun-14 13:07 UTC
[R] Create a categorical variable using the deciles of data
Can you explain why you are not using ?quantile to find the deciles then ?cut to construct the factor? What have I misunderstood? On Tue, 14 Jun 2022 at 23:29, anteneh asmare <hanatezera at gmail.com> wrote:> I want Create a categorical variable using the deciles of the > following data frame to divide the individuals into 10 groups equally. > I try the following codes > data_catigocal<-data.frame(c(1:50000)) > # create categorical vector using deciles > group_vector <- > > c('0-10','11-20','21-30','31-40','41-50','51-60','61-70','71-80','81-90','91-100') > # Add categorical variable to the data_catigocal > data_catigocal$decile <- factor(group_vector) > # print data frame > data_catigocal > > can any one help me with the r code > Kind regards, > Hana > > ______________________________________________ > R-help at r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. >[[alternative HTML version deleted]]